Embracing Unknown Step by Step: Towards Reliable Sparse Training in Real World (2403.20047v1)
Abstract: Sparse training has emerged as a promising method for resource-efficient deep neural networks (DNNs) in real-world applications. However, the reliability of sparse models remains a crucial concern, particularly in detecting unknown out-of-distribution (OOD) data. This study addresses the knowledge gap by investigating the reliability of sparse training from an OOD perspective and reveals that sparse training exacerbates OOD unreliability. The lack of unknown information and the sparse constraints hinder the effective exploration of weight space and accurate differentiation between known and unknown knowledge. To tackle these challenges, we propose a new unknown-aware sparse training method, which incorporates a loss modification, auto-tuning strategy, and a voting scheme to guide weight space exploration and mitigate confusion between known and unknown information without incurring significant additional costs or requiring access to additional OOD data. Theoretical insights demonstrate how our method reduces model confidence when faced with OOD samples. Empirical experiments across multiple datasets, model architectures, and sparsity levels validate the effectiveness of our method, with improvements of up to \textbf{8.4\%} in AUROC while maintaining comparable or higher accuracy and calibration. This research enhances the understanding and readiness of sparse DNNs for deployment in resource-limited applications. Our code is available on: \url{https://github.com/StevenBoys/MOON}.
- Detecting semantic anomalies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 3154–3162, 2020.
- Probabilistic modeling of deep features for out-of-distribution and adversarial detection. arXiv preprint arXiv:1909.11786, 2019.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470, 2020.
- On the robustness and anomaly detection of sparse neural networks. arXiv preprint arXiv:2207.04227, 2022.
- Deep rewiring: Training very sparse deep networks. arXiv preprint arXiv:1711.05136, 2017.
- Deep rewiring: Training very sparse deep networks. International Conference on Learning Representations (ICLR), 2018.
- Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1563–1572, 2016.
- Federated dynamic sparse training: Computing less, communicating less, yet learning better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6080–6088, 2022.
- Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58, 2009.
- Sparsity winning twice: Better robust generaliztion from more efficient training. arXiv preprint arXiv:2202.09844, 2022a.
- Can you win everything with a lottery ticket? Transactions on Machine Learning Research, 2022b.
- Average of pruning: Improving performance and stability of out-of-distribution detection. arXiv preprint arXiv:2303.01201, 2023.
- Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Toward trustworthy programming for autonomous concurrent systems. AI & SOCIETY, pp. 1–3, 2022.
- Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
- Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840, 2019.
- Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865, 2018.
- Vos: Learning what you don’t know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197, 2022.
- Active learning at the imagenet scale. arXiv preprint arXiv:2111.12880, 2021.
- The difficulty of training sparse neural networks. arXiv preprint arXiv:1906.10732, 2019.
- Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pp. 2943–2952. PMLR, 2020.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
- On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
- Adbench: Anomaly detection benchmark. arXiv preprint arXiv:2206.09426, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Sparse double descent: Where network pruning aggravates overfitting. In International Conference on Machine Learning, pp. 8635–8659. PMLR, 2022.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
- Scaling out-of-distribution detection for real-world settings. In International Conference on Machine Learning, pp. 8759–8773. PMLR, 2022.
- Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10951–10960, 2020.
- Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9068–9077, 2022.
- Dynamic sparse training via balancing the exploration-exploitation trade-off. arXiv preprint arXiv:2211.16667, 2022.
- Neurogenesis dynamics-inspired spiking neural network training acceleration. arXiv preprint arXiv:2304.12214, 2023.
- Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
- Top-kast: Top-k always sparse training. Advances in Neural Information Processing Systems, 33:20744–20754, 2020.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Balance is essence: Accelerating sparse training via adaptive gradient correction. arXiv preprint arXiv:2301.03573, 2023a.
- Calibrating the rigged lottery: Making all tickets reliable. International Conference on Learning Representations (ICLR), 2023b.
- Cutpaste: Self-supervised learning for anomaly detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674, 2021.
- Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
- Comprehensive graph gradual pruning for sparse training in graph neural networks. arXiv preprint arXiv:2207.08629, 2022a.
- Deep ensembling with no overhead for either training or testing: The all-round blessings of dynamic sparsity. arXiv preprint arXiv:2106.14568, 2021a.
- Do we actually need dense over-parameterization? in-time over-parameterization in sparse training. In International Conference on Machine Learning, pp. 6989–7000. PMLR, 2021b.
- The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. arXiv preprint arXiv:2202.02643, 2022b.
- Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020.
- Good-d: On unsupervised graph out-of-distribution detection. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pp. 339–347, 2023.
- Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):1–12, 2018.
- Provable guarantees for understanding out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7831–7840, 2022.
- Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning, pp. 4646–4655. PMLR, 2019.
- Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pp. 7076–7087. PMLR, 2020.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Measuring calibration in deep learning. In CVPR Workshops, volume 2, 2019.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Training adversarially robust sparse networks via bayesian connectivity sampling. In International Conference on Machine Learning, pp. 8314–8324. PMLR, 2021.
- Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks. arXiv preprint arXiv:2007.05134, 2020.
- Improving uncertainty of deep learning-based object classification on radar spectra using label smoothing. In 2022 IEEE Radar Conference (RadarConf22), pp. 1–6. IEEE, 2022.
- Rahul Rahaman et al. Uncertainty quantification and deep ensembles. Advances in Neural Information Processing Systems, 34:20063–20075, 2021.
- Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Computers in Biology and Medicine, pp. 106043, 2022.
- Does your dermatology classifier know what it doesn’t know? detecting the long-tail of unseen conditions. Medical Image Analysis, 75:102274, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Powerpropagation: A sparsity inducing weight reparameterisation. Advances in Neural Information Processing Systems, 34:28889–28903, 2021.
- A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Opportunities and challenges in deep learning adversarial robustness: A survey. arXiv preprint arXiv:2007.00753, 2020.
- Dynamic sparse training for deep reinforcement learning. arXiv preprint arXiv:2106.04217, 2021.
- On the effectiveness of sparsification for detecting the deep unknowns. arXiv preprint arXiv:2111.09805, 2021.
- Out-of-distribution detection with deep nearest neighbors. arXiv preprint arXiv:2204.06507, 2022.
- [reproducibility report] rigging the lottery: Making all tickets winners. arXiv preprint arXiv:2103.15767, 2021.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- David Martinus Johannes Tax. One-class classification: Concept learning in the absence of counter-examples. 2002.
- Combating label noise in deep learning using abstention. arXiv preprint arXiv:1905.10964, 2019.
- 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence, 30(11):1958–1970, 2008.
- Plex: Towards reliability using pretrained large model extensions. arXiv preprint arXiv:2207.07411, 2022.
- Open-set recognition: A good closed-set classifier is all you need. arXiv preprint arXiv:2110.06207, 2021.
- Calibrated learning to defer with one-vs-all classifiers. In International Conference on Machine Learning, pp. 22184–22202. PMLR, 2022.
- Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. Advances in Neural Information Processing Systems, 34:11809–11820, 2021a.
- Energy-based open-world uncertainty modeling for confidence calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9302–9311, 2021b.
- Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pp. 23631–23644. PMLR, 2022a.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022b.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp. 23965–23998. PMLR, 2022.
- Towards reliable rare category analysis on graphs via individual calibration. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2629–2638, 2023.
- Rewoo: Decoupling reasoning from observations for efficient augmented language models. arXiv preprint arXiv:2305.18323, 2023.
- Semantically coherent out-of-distribution detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8301–8309, 2021a.
- Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021b.
- Openood: Benchmarking generalized out-of-distribution detection. arXiv preprint arXiv:2210.07242, 2022.
- Superposing many tickets into one: A performance booster for sparse neural network training. arXiv preprint arXiv:2205.15322, 2022.
- Unsupervised out-of-distribution detection by maximum classifier discrepancy. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9518–9526, 2019.
- Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems, 34:20838–20850, 2021.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In International conference on machine learning, pp. 11117–11128. PMLR, 2020.
- Effective sparsification of neural networks with global sparsity constraint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3599–3608, 2021.
- Rethinking data distillation: Do not overlook calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4935–4945, 2023.