Unlocking High-Accuracy Differentially Private Image Classification through Scale

Differential Privacy (DP) provides a formal privacy guarantee preventingadversaries with access to a machine learning model from extracting informationabout individual training points. Differentially Private Stochastic GradientDescent (DP-SGD), the most popular DP training method for deep learning,realizes this protection by injecting noise during training. However previousworks have found that DP-SGD often leads to a significant degradation inperformance on standard image classification benchmarks. Furthermore, someauthors have postulated that DP-SGD inherently performs poorly on large models,since the norm of the noise required to preserve privacy is proportional to themodel dimension. In contrast, we demonstrate that DP-SGD on over-parameterizedmodels can perform significantly better than previously thought. Combiningcareful hyper-parameter tuning with simple techniques to ensure signalpropagation and improve the convergence rate, we obtain a new SOTA withoutextra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layerWide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning apre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNetunder (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracyunder (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-privateSOTA for this task. We believe our results are a significant step towardsclosing the accuracy gap between private and non-private image classification.