HyperAIHyperAI

Command Palette

Search for a command to run...

The Hidden Engine of AI: How Training Data Startups Are Fueling the Industry’s Growth Amid Rising Demand and Scrutiny

The rise of AI has sparked a quiet revolution in the world of training data, transforming what was once seen as mundane, low-value work into a high-stakes, billion-dollar industry. At the heart of this shift are companies like Mercor, Surge AI, Scale AI, and Handshake, which are no longer just data providers—they are now central players in the race to build intelligent machines. Brendan Foody, a 22-year-old entrepreneur, launched Mercor in 2023 as a tech-driven staffing platform for startups to hire overseas engineers. Using AI to screen resumes and conduct interviews, the company quickly scaled to $1 million in annualized revenue. But its real transformation came in early 2024 when Scale AI approached Mercor with a massive request: 1,200 software engineers to help train AI models in coding. This moment revealed a growing demand for expert human input in AI development—especially as models like those from OpenAI and Anthropic began trying to learn how to write code. Foody saw an opportunity. When the engineers he hired began reporting issues with pay and platform management—allegations that Scale has faced in lawsuits—Foody decided to cut out the middleman. He restructured Mercor into a direct provider of high-quality, expert-labeled data, focusing on niche, high-skill tasks. By September, Mercor had hit $500 million in annualized revenue, making it the fastest-growing company in history, surpassing even Cursor, the AI coding tool. A recent funding round valued Mercor at $10 billion, and Foody, along with his co-founders, became some of the youngest self-made billionaires. This boom is not isolated. Surge AI, founded by data scientist Edwin Chen, has quietly become a major rival to Scale, reporting over $1 billion in revenue last year and operating with higher pay and tighter quality controls. Surge’s success comes from its focus on expert annotators—PhD-level scientists, lawyers, and engineers—rather than mass crowdsourcing. Similarly, Handshake AI, built on a platform for college career services, saw demand triple after the Meta-Scale deal, growing from three to 150 employees in five months. Other companies like Turing, Labelbox, Invisible Technologies, and Micro1 have pivoted from staffing or software to become full-fledged data providers. The reason for this surge? AI models are hitting limits. Early progress relied on massive, generic datasets. Now, breakthroughs come from specialized, human-curated data—especially in domains like coding, finance, law, and medicine. To train models effectively, labs need granular “grading rubrics” that define what a correct or high-quality response looks like. These rubrics are painstakingly built by experts and can take over 10 hours to refine. OpenAI’s medical benchmark, for example, includes nearly 50,000 criteria across thousands of prompts. AI labs are also creating “reinforcement learning environments”—digital simulations where models practice tasks like sending emails, using Salesforce, or navigating enterprise software. These environments are becoming a booming market, with companies like Snorkel AI and others building custom platforms for specific workflows. Yet despite the massive spending, progress remains narrow. Models can ace coding challenges and pass bar exams, but they often invent facts, misapply logic, or fail at real-world tasks that require judgment, context, and nuance. As Joelle Pineau of Cohere notes, AI struggles with multiple, conflicting goals—something reinforcement learning isn’t built to handle. The solution? More data, more experts, more rubrics. The irony is stark: the most human-like AI still depends on human labor. This is the paradox of modern AI—what was once called “janitorial work” is now the most valuable part of the system. And while frontier labs chase superintelligence, the real work is being done by thousands of experts around the world, paid to define what “good” looks like in every possible context. The data industry is now a battleground of innovation, competition, and survival. Companies are fighting over talent, reputation, and trust. Scale, despite its Meta ties and legal troubles, continues to grow. Surge is preparing for a $1 billion funding round. Mercor is expanding into enterprise AI. Handshake is building a global network of PhDs and specialists. Even Uber has entered the fray, acquiring a Belgian data labeling startup to train drivers as annotators during downtime. This isn’t just a supply chain—it’s a new economy. As Daniel Kang of the University of Illinois observes, the future of AI may not be generalization, but specialization. The need for human data isn’t shrinking—it’s expanding. And for now, the companies that provide it are the ones making money. In a world where AI promises to automate everything, the most valuable job may not be coding or designing models—but evaluating them. As Foody puts it, “The entire economy will become a reinforcement learning environment.” And someone has to feed the machine.

Related Links