2 months ago

TAN Without a Burn: Scaling Laws of DP-SGD

Sander, Tom ; Stock, Pierre ; Sablayrolles, Alexandre

Abstract

Differentially Private methods for training Deep Neural Networks (DNNs) haveprogressed recently, in particular with the use of massive batches andaggregated data augmentations for a large number of training steps. Thesetechniques require much more computing resources than their non-privatecounterparts, shifting the traditional privacy-accuracy trade-off to aprivacy-accuracy-compute trade-off and making hyper-parameter search virtuallyimpossible for realistic scenarios. In this work, we decouple privacy analysisand experimental behavior of noisy training to explore the trade-off withminimal computational requirements. We first use the tools of R\'enyiDifferential Privacy (RDP) to highlight that the privacy budget, when notovercharged, only depends on the total amount of noise (TAN) injectedthroughout training. We then derive scaling laws for training models withDP-SGD to optimize hyper-parameters with more than a $100\times$ reduction incomputational budget. We apply the proposed method on CIFAR-10 and ImageNetand, in particular, strongly improve the state-of-the-art on ImageNet with a +9points gain in top-1 accuracy for a privacy budget epsilon=8.