0/1 Deep Neural Networks via Block Coordinate Descent

The step function is one of the simplest and most natural activationfunctions for deep neural networks (DNNs). As it counts 1 for positivevariables and 0 for others, its intrinsic characteristics (e.g., discontinuityand no viable information of subgradients) impede its development for severaldecades. Even if there is an impressive body of work on designing DNNs withcontinuous activation functions that can be deemed as surrogates of the stepfunction, it is still in the possession of some advantageous properties, suchas complete robustness to outliers and being capable of attaining the bestlearning-theoretic guarantee of predictive accuracy. Hence, in this paper, weaim to train DNNs with the step function used as an activation function (dubbedas 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimizationproblem and then solve it by a block coordinate descend (BCD) method. Moreover,we acquire closed-form solutions for sub-problems of BCD as well as itsconvergence properties. Furthermore, we also integrate$\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process andcompress the network scale. As a result, the proposed algorithm has a highperformance on classifying MNIST and Fashion-MNIST datasets. As a result, theproposed algorithm has a desirable performance on classifying MNIST,FashionMNIST, Cifar10, and Cifar100 datasets.