Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When do Convolutional Neural Networks Stop Learning? (2403.02473v1)

Published 4 Mar 2024 in cs.CV

Abstract: Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in computer vision tasks such as image classification, detection, segmentation, and medical image analysis. In general, an arbitrary number of epochs is used to train such neural networks. In a single epoch, the entire training data -- divided by batch size -- are fed to the network. In practice, validation error with training loss is used to estimate the neural network's generalization, which indicates the optimal learning capacity of the network. Current practice is to stop training when the training loss decreases and the gap between training and validation error increases (i.e., the generalization gap) to avoid overfitting. However, this is a trial-and-error-based approach which raises a critical question: Is it possible to estimate when neural networks stop learning based on training data? This research work introduces a hypothesis that analyzes the data variation across all the layers of a CNN variant to anticipate its near-optimal learning capacity. In the training phase, we use our hypothesis to anticipate the near-optimal learning capacity of a CNN variant without using any validation data. Our hypothesis can be deployed as a plug-and-play to any existing CNN variant without introducing additional trainable parameters to the network. We test our hypothesis on six different CNN variants and three different general image datasets (CIFAR10, CIFAR100, and SVHN). The result based on these CNN variants and datasets shows that our hypothesis saves 58.49\% of computational time (on average) in training. We further conduct our hypothesis on ten medical image datasets and compared with the MedMNIST-V2 benchmark. Based on our experimental result, we save $\approx$ 44.1\% of computational time without losing accuracy against the MedMNIST-V2 benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  3. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  4. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  5. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  6. Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  8. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  9. Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  10. Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  11. Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  12. Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  13. Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  14. Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  15. Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  16. Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  17. Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  18. Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  19. Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  20. Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  21. Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  22. Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  23. Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  24. Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  25. Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  26. Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  27. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  28. Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  29. Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  30. Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  31. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  32. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  33. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  34. Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  35. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  36. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  37. Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  38. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  39. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  40. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
  41. Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW

Summary

We haven't generated a summary for this paper yet.