HyperAI

In December 2025, Andreessen Horowitz (a16z) and OpenRouter, an AI inference platform, released a groundbreaking report titled State of AI, based on over 100 trillion tokens of real-world user interactions processed through OpenRouter’s infrastructure. The timing coincided with the one-year anniversary of OpenAI’s o1 reasoning model launch—a milestone that marked a pivotal shift in AI from simple forward-pass inference to multi-step internal reasoning. The past year has brought deeper transformation than many anticipated. What sets this report apart is its data source. Unlike proprietary model providers such as OpenAI or Anthropic, OpenRouter acts as a unified interface connecting users to hundreds of language models. This allows for unprecedented visibility into actual usage patterns—what models people choose, what tasks they perform, and how much they spend—without accessing raw prompts or outputs. All analysis is based on metadata: timestamps, model selection, token counts, and tool invocation status. While this limits precision in certain classifications, it enables large-scale behavioral insights. One of the most striking findings is the rapid rise of open-source models (OSS). Their share of total token consumption climbed steadily over the year, reaching nearly 30% by late 2025—up from near insignificance just a year earlier. China’s open-source models played a major role in this surge. Names like DeepSeek, Alibaba’s Qwen, and Moonshot AI’s Kimi series were virtually unknown to Western developers a year ago but now rank prominently on OpenRouter’s usage charts. Chinese OSS models accounted for about 13% of weekly usage on average, nearly matching non-Chinese open-source models at 13.7%. This reflects a growing bipolar landscape: a dual-track ecosystem where closed-source systems maintain dominance in high-reliability enterprise and regulated applications, while open-source models thrive on cost efficiency and customization. Within this ecosystem, medium-sized models (15B to 70B parameters) have emerged as the fastest-growing segment. Small models (under 15B) continue to decline in usage despite increasing availability. Large models (over 70B) remain competitive but lack monopolistic dominance—Qwen, Z.AI, and OpenAI’s GPT-OSS series all sustain significant usage. The real breakout came in November 2024 with the release of Qwen2.5 Coder 32B, followed by Mistral Small 3 and GPT-OSS 20B. Developers are increasingly seeking the sweet spot between capability and efficiency—smart enough for complex tasks, fast and affordable enough for daily use. Unexpectedly, the most dominant use case for open-source models isn’t productivity—it’s roleplay. Analysis of a 0.25% sample of prompts revealed that roleplay tasks consumed over half of all tokens used with open-source models. Programming came second, accounting for 15%–20%. This challenges the assumption that LLMs are primarily tools for coding or summarization. Instead, users are treating models as interactive storytelling partners—ideal for fiction writing, virtual companionship, and immersive simulations. Open-source models excel here due to fewer restrictive filters and greater flexibility for fine-tuning, enabling nuanced emotional responses and narrative continuity. Programming, however, tells a different story when closed-source models are included. Across all models, programming usage surged from around 11% at the start of 2025 to over 50% by year-end—becoming the most competitive and strategically vital category. Anthropic’s Claude series led with over 60% share, though recently dipped below that threshold. OpenAI’s share rose from 2% to 8%, Google stabilized at 15%, and emerging players like MiniMax, Z.AI, and Qwen are rapidly gaining ground. The report labels programming as “the most strategically important category”—even minor improvements in reasoning or latency can trigger weekly shifts in market share. Another transformative trend is the rise of Agentic Inference: the use of models not just for generating text, but as components in autonomous systems that plan multi-step workflows, invoke external tools, and maintain context across interactions. Evidence includes a surge in reasoning-optimized models (like o1, GPT-5, Claude 4.5, Gemini 3), whose token usage now exceeds 50% of total. Tool invocation rates have steadily climbed, with Claude 4.5 Sonnet leading, followed by xAI’s Grok Code Fast and Z.AI’s GLM 4.5. Perhaps most telling is the explosion in sequence length. Average prompt length grew nearly fourfold—from ~1,500 to over 6,000 tokens—while output length tripled. This reflects a shift from open-ended generation (“Write an article”) to complex, context-heavy reasoning tasks—especially in programming, where inputs often exceed 20,000 tokens. Models are evolving from creative generators into analytical engines. Geographically, AI usage is becoming increasingly global and decentralized. While North America remains the largest market, its share dropped below 50% for much of the year. Europe held steady at 15%–20%. Asia’s share surged from ~13% to 31%, driven by rising enterprise adoption and the global reach of Chinese models. English dominates at 82.87%, followed by Simplified Chinese at 4.95%, with growing presence from Russian, Spanish, and Thai. Cost elasticity is surprisingly low. Despite price reductions, usage grows only slightly—about 0.5%–0.7% per 10% price drop. This indicates the market is not yet commoditized. Models fall into four quadrants: Premium Leaders (e.g., Claude 3.7 Sonnet), Efficient Giants (Gemini 2.0 Flash), Long Tail (e.g., Qwen 2 7B Instruct), and Premium Specialists (GPT-4, GPT-5 Pro). Closed-source models dominate high-value tasks; open-source models absorb price-sensitive usage—but open-source is closing the performance gap. The report introduces the “Cinderella Glass Slipper Effect”: when a new frontier model perfectly matches an unsolved, high-value workload, it creates strong user lock-in. Teams build pipelines and workflows around it, making migration costly. Retention curves show this clearly—Gemini 2.5 Pro and Claude 4 Sonnet retained ~40% of users after five months, while models like GPT-4o Mini established unmatched stickiness after their initial launch. In contrast, models like Gemini 2.0 Flash and Llama 4 Maverick showed no strong retention—indicating they never achieved that pivotal fit. DeepSeek’s data revealed a “boomerang effect”—users tested alternatives but returned, confirming DeepSeek’s superiority in certain tasks. In conclusion, the report underscores four key shifts: a multi-model ecosystem is now standard; AI use extends far beyond productivity into entertainment and companionship; agentic inference is becoming the default mode; and globalization is accelerating, with cultural and linguistic adaptability shaping future competition. Limitations remain: data reflects only OpenRouter’s ecosystem, excluding enterprise internal use or local deployments. Geographic inference relies on billing addresses, not verified locations. Still, the findings offer a robust, real-world lens into how AI is truly being used—and where it’s headed.

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

The Real-World Application of AI: From Roleplay to Agentic Workflows

Related Links

Command Palette

The Real-World Application of AI: From Roleplay to Agentic Workflows

Related Links

Command Palette

The Real-World Application of AI: From Roleplay to Agentic Workflows

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.