HyperAI

Over the last decades, developing more powerful neural architectures and simul.taneously designing optimization algorithms to effectively train them have beenthe core of research efforts to enhance the capability of machine learning models.Despite the recent progresses, particularly in developing Language Models (LMs).there are fundamental challenges and unanswered questions about how such modelscan continually learn/memorize, self-improved, and find “effective solutions,”.Inthis paper, we present a new learning paradigm, called Nested Learning (NL), thatcoherently represents a model with a set of nested, multi-level, and/or paralleloptimization problems, each of which with its own “context fow”. NL revealsthat existing deep learning methods learns from data through compressing theirown context flow, and explain how in-context learning emerges in large modelsNL, suggests a path (a new dimension to deep learning) to design more expressivelearning algorithms with more “'levels”, resulting in higher-order in-context learn-ing abilities. In addition to its neuroscientifically plausible and mathematicallywhite-box nature, we advocate for its importance by presenting three core contribu.tions: (1) Deep Optimizers: Based on NL, we show that well-known gradient-basedoptimizers (e.g., Adam, SGD with Momentum, etc.) are in fact associative memorymodules that aim to compress the gradients with gradient descent. Building on thisinsight, we present a set of more expressive optimizers with deep memory and/ormore powerful learning rules; (2) Self-Modifying Titans: Taking advantage of NL'sinsights on learning algorithms, we present a novel sequence model that learnshow to modify itself by learning its own update algorithm; and (3) ContinuumMemory System: We present a new formulation for memory system that general.izes the traditional viewpoint of “long-term/short-term memory”. Combining ourself-modifying sequence model with the continuum memory system, we present alearning module, called HoPE, showing promising results in language modeling,.continual learning, and long-context reasoning tasks.

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz Meisam Razaviyayn Peiling Zhong Vahab Mirrokni

Abstract

Build AI with AI

Hyper Newsletters

Command Palette

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz Meisam Razaviyayn Peiling Zhong Vahab Mirrokni

Abstract

Build AI with AI

Hyper Newsletters