Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Published 28 Jul 2020 in stat.ML, cs.CR, and cs.LG | (2007.14191v1)

Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis.

Citations (166)

Summary

  • The paper proposes tempered sigmoid activations that control gradient magnitudes for more effective differentially private training.
  • It employs DP-SGD with these activations to achieve state-of-the-art accuracies of 98.1% on MNIST, 86.1% on FashionMNIST, and 66.2% on CIFAR10.
  • The study challenges retroactive privacy adaptations, advocating for tailored model architectures to harmonize data privacy with robust performance.

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

In "Tempered Sigmoid Activations for Deep Learning with Differential Privacy," the authors confront a fundamental challenge in machine learning—training models that are not only accurate but also respect the privacy of the data used. The focus is on differential privacy, a framework that provides strong guarantees that the privacy of individuals in the dataset is maintained. The prevalent approach has been to apply differential privacy techniques as an afterthought, adapting existing model architectures not inherently designed for privacy-preserving contexts. This paper critiques that retroactive modification approach and posits a shift towards designing model architectures explicitly for privacy-preserving training from the outset.

Contributions

The primary contribution of the paper is the proposal of tempered sigmoids as a new type of activation function tailored for differentially private deep learning. These functions are a family of bounded activations, distinct from the unbounded ReLU activations commonly used, which have been shown to lead to exploding gradient problems when training under differential privacy. Tempered sigmoids, through their bounded nature, help control the magnitude of activations and gradients. This facilitates stronger bounds on sensitivity, integral to achieving better privacy guarantees under the differential privacy framework.

The authors demonstrate both analytically and experimentally that tempered sigmoids can lead to improved privacy and utility trade-offs. Notably, they achieve state-of-the-art accuracy on three benchmark datasets: MNIST, FashionMNIST, and CIFAR10. By integrating these new activations, they report not requiring modifications to the learning procedure fundamentals or the differential privacy analysis, simplifying their adoption in practice.

Experimental Insights

The experimental framework utilizes DP-SGD, a private variant of stochastic gradient descent, to train models with tempered sigmoid activations. The experiments cover three datasets:

  • MNIST: The authors achieve a test accuracy of 98.1% for a privacy guarantee of (ε,δ)=(2.93,105)(\varepsilon, \delta)=(2.93, 10^{-5}), which surpasses the previous best reported accuracy.
  • FashionMNIST: Using tempered sigmoids, the authors report a test accuracy of 86.1% for (ε,δ)=(2.7,105)(\varepsilon, \delta)=(2.7, 10^{-5}), representing a notable increase over previous results.
  • CIFAR10: The accuracy achieved is 66.2% at (ε,δ)=(7.53,105)(\varepsilon, \delta)=(7.53, 10^{-5}), improving upon past efforts under similar privacy constraints.

Across these datasets, the utilization of tempered sigmoid functions, particularly the tanh, emphasizes that changing activation functions can significantly enhance model performance while adhering to strict privacy constraints.

Implications and Future Directions

The theoretical and practical insights derived suggest a profound implication: neural network architectures designed for non-private scenarios might not be optimal when subjected to differential privacy constraints. The introduction of bounded activation functions such as tempered sigmoids offers a pathway to mitigate information loss due to gradient clipping and noise addition in DP-SGD, providing architects of machine learning models with a new tool to enhance differentially private model training.

Future research directions could explore optimizing other architectural elements for privacy-preserving contexts, potentially leading to further performance gains. Additionally, investigating the parameterization of tempered activations on more complex architectures and training regimes could yield insights into their scalability and adaptability for wider applications. The concept can also inspire further exploration into other types of bounded functions and their impact on privacy and utility in deep learning.

In conclusion, the synthesis of tempered sigmoids in privacy-preserving learning paradigms represents a significant step toward harmonizing utility with the rigorous privacy demands essential in modern machine learning applications. This shift in architectural design philosophy promises prospective advancements in how private data is leveraged for machine learning without compromising on model efficacy.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.