Fast Ensembling with Diffusion Schrödinger Bridge (2404.15814v1)
Abstract: Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schr\"odinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn.
- Depth uncertainty in neural networks. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020.
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. In International Conference on Learning Representations (ICLR), 2020.
- Diffusion schrödinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021.
- Leo Breiman. Bagging predictors. Machine Learning, 1996.
- Glenn W. Brier. Verification of Forecasts Expressed in Terms of Probability. Monthly Weather Review, 1950.
- Likelihood training of schrödinger bridge using forward-backward sdes theory. In International Conference on Learning Representations (ICLR), 2022.
- François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Efficient and scalable bayesian neural nets with rank-1 factors. In Proceedings of The 37th International Conference on Machine Learning (ICML 2020), 2020.
- Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018.
- On calibration of modern neural networks. In Proceedings of The 34th International Conference on Machine Learning (ICML 2017), 2017.
- Training independent subnetworks for robust prediction. In International Conference on Learning Representations (ICLR), 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations (ICLR), 2019.
- Distilling the knowledge in a neural network. In Advances in Neural Information Processing Systems 27 (NIPS 2014), 2015.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- Learning multiple layers of features from tiny images, 2009.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.
- Why m heads are better than one: Training a diverse ensemble of deep networks. arXiv:1511.06314, 2015.
- Tiny ImageNet. https://www.kaggle.com/c/tiny-imagenet, 2017. [Online; accessed 19-May-2022].
- I22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTsb: Image-to-image schrödinger bridge. In Proceedings of The 40th International Conference on Machine Learning (ICML 2023), 2023.
- Ensemble distribution distillation. In International Conference on Learning Representations (ICLR), 2020.
- Diversity matters when learning from ensembles. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021.
- Functional ensemble distillation. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022.
- Scaling ensemble distribution distillation to many classes with proxy targets. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021.
- Palette: Image-to-image diffusion models. In SIGGRAPH (Conference Paper Track), 2022.
- Image super-resolution via iterative refinement. IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (ICLR), 2022.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- E. Schrödinger. Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. Annales de l’institut Henri Poincaré, 1932.
- Filter response normalization layer: Eliminating batch dependence in the training of deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- An ensemble with shared representations based on convolutional networks for continually learning facial expressions. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
- Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations (ICLR), 2022.
- CSDI: conditional score-based diffusion models for probabilistic time series imputation. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021.
- Hydra: Preserving ensemble diversity for model distillation, 2021.
- Zero-shot image restoration using denoising diffusion null-space model. In International Conference on Learning Representations (ICLR), 2023.
- Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In International Conference on Learning Representations (ICLR), 2019.
- Deblurring via stochastic refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Traversing between modes in function space for fast ensembling. In Proceedings of The 40th International Conference on Machine Learning (ICML 2023), 2023.