Shifting AI Efficiency From Model-Centric to Data-Centric Compression

The rapid advancement of large language models (LLMs) and multi-modal LLMs(MLLMs) has historically relied on model-centric scaling through increasingparameter counts from millions to hundreds of billions to drive performancegains. However, as we approach hardware limits on model size, the dominantcomputational bottleneck has fundamentally shifted to the quadratic cost ofself-attention over long token sequences, now driven by ultra-long textcontexts, high-resolution images, and extended videos. In this position paper,\textbf{we argue that the focus of research for efficient AI is shifting frommodel-centric compression to data-centric compression}. We position tokencompression as the new frontier, which improves AI efficiency via reducing thenumber of tokens during model training or inference. Through comprehensiveanalysis, we first examine recent developments in long-context AI acrossvarious domains and establish a unified mathematical framework for existingmodel efficiency strategies, demonstrating why token compression represents acrucial paradigm shift in addressing long-context overhead. Subsequently, wesystematically review the research landscape of token compression, analyzingits fundamental benefits and identifying its compelling advantages acrossdiverse scenarios. Furthermore, we provide an in-depth analysis of currentchallenges in token compression research and outline promising futuredirections. Ultimately, our work aims to offer a fresh perspective on AIefficiency, synthesize existing research, and catalyze innovative developmentsto address the challenges that increasing context lengths pose to the AIcommunity's advancement.