HyperAI

Liquid AI has launched a new "liquid" edge model called Hyena Edge, which outperforms traditional Transformers in both efficiency and quality during initial tests. Unlike conventional models that rely heavily on human expertise and intuition or automated search methods limited by space constraints, the STAR framework, based on Liquid Information Theory (LIV), offers a comprehensive solution. LIV theory is a foundational principle that can uniformly describe various computational units commonly found in deep learning, including different attention mechanisms, convolutional neural networks, recurrent neural networks, and other structured algorithms. Building on LIV theory, the STAR framework introduces a novel, layered space for constructing model architecture, termed the "Genome." This Genome encompasses multiple layers of structural information, from bottom-layer feature extraction methods and operator structures (which define how tokens and channels are combined) to top-layer backbone networks (which specify the connections and combinations of LIV units). The design of this architecture features excellent hierarchical and modular characteristics. To refine and optimize these architectural Genomes, STAR employs an advanced evolutionary algorithm. This includes steps like evaluation (based on predefined performance metrics), recombination (combining optimized structural features), and mutation (introducing random variations to explore new architectures). The framework supports multi-objective optimization, allowing it to simultaneously consider various conflicting factors such as model size, parameter count, inference cache size, and latency. This ensures the framework can generate structures that perform well across all these metrics. According to the technical documentation, STAR excels in optimizing large language model architectures. Whether focusing on pure quality optimization, quality and parameter count, or quality and inference cache size, the resulting structures often surpass highly optimized models like Transformer++ and StripedMamba. For example, when optimizing for quality and parameter count, 7/8 of the STAR-evolved architectures performed better than Transformer++ and hybrid models, while reducing parameter counts by up to 13%. When optimizing for quality and cache size, 7/8 of the evolved structures achieved a cache size reduction of 37% compared to hybrid models and 90% compared to Transformer models, without compromising quality and sometimes even improving it. In designing Hyena Edge, Liquid AI utilized the STAR framework. Starting with an initial population of 16 candidate structures, they conducted 24 generations of evolutionary iterations. The search space was rich and diverse, incorporating various types of convolutional operators. These operators were primarily activated by the Hyena architecture, which inspired their integration. The development of Hyena Edge represents a significant advancement in edge computing, where optimizing model efficiency and performance is crucial. By leveraging the STAR framework and its robust optimization capabilities, Liquid AI has created a model that not only performs better but also reduces resource consumption, making it an ideal choice for devices with limited computational power.

Related Links

Related Links

Related Links

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Command Palette

New Liquid AI Model Hyena Edge Outperforms Transformer in Efficiency

Related Links

Command Palette

New Liquid AI Model Hyena Edge Outperforms Transformer in Efficiency

Related Links

Command Palette

New Liquid AI Model Hyena Edge Outperforms Transformer in Efficiency

Related Links

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.