2 months ago

HyperNetworks

Ha, David ; Dai, Andrew ; Le, Quoc V.

Abstract

This work explores hypernetworks: an approach of using a one network, alsoknown as a hypernetwork, to generate the weights for another network.Hypernetworks provide an abstraction that is similar to what is found innature: the relationship between a genotype - the hypernetwork - and aphenotype - the main network. Though they are also reminiscent of HyperNEAT inevolution, our hypernetworks are trained end-to-end with backpropagation andthus are usually faster. The focus of this work is to make hypernetworks usefulfor deep convolutional networks and long recurrent networks, wherehypernetworks can be viewed as relaxed form of weight-sharing across layers.Our main result is that hypernetworks can generate non-shared weights for LSTMand achieve near state-of-the-art results on a variety of sequence modellingtasks including character-level language modelling, handwriting generation andneural machine translation, challenging the weight-sharing paradigm forrecurrent networks. Our results also show that hypernetworks applied toconvolutional networks still achieve respectable results for image recognitiontasks compared to state-of-the-art baseline models while requiring fewerlearnable parameters.