7 hours ago

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian

Abstract

While Mixture-of-Experts (MoE) architectures have become the standard for sparsity scaling in large language models, they increasingly face diminishing returns and system-level bottlenecks. In this work, we explore embedding scaling as a potent, orthogonal dimension for scaling sparsity. Through a comprehensive analysis and experiments, we identify specific regimes where embedding scaling achieves a superior Pareto frontier compared to expert scaling. We systematically characterize the critical architectural factors governing this efficacy—ranging from parameter budgeting to the interplay with model width and depth. Moreover, by integrating tailored system optimizations and speculative decoding, we effectively convert this sparsity into tangible inference speedups. Guided by these insights, we introduce LongCat-Flash-Lite, a 68.5B parameter model with ∼3B activated trained from scratch. Despite allocating over 30B parameters to embeddings, LongCat-Flash-Lite not only surpasses parameter-equivalent MoE baselines but also exhibits exceptional competitiveness against existing models of comparable scale, particularly in agentic and coding domains

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 hours ago

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 hours ago

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Scaling Embeddings Outperforms Scaling Experts in Language Models

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Scaling Embeddings Outperforms Scaling Experts in Language Models

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Scaling Embeddings Outperforms Scaling Experts in Language Models

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian6 more

Abstract

Build AI with AI

HyperAI Newsletters

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian

Hong Liu Jiaqi Zhang Chao Wang Xing Hu Linkun Lyu Jiaqi Sun Xurui Yang Bo Wang Fengcun Li Yulei Qian