HyperAI

LMArena, the platform that lets users pit AI models against each other in real-world tests, has grown into a major hub for evaluating artificial intelligence. Founded in 2023 by researchers from UC Berkeley, including CTO Wei-Lin Chiang, the platform began as a research project called Chatbot Arena. Its mission was simple: let people test and compare AI models through direct interaction, rather than relying solely on traditional benchmarks. Chiang explained that the project emerged when models like ChatGPT and Meta’s Llama 1 were new, and there was no clear way to determine which was better. The team wanted to create a community-driven evaluation system. Users submit prompts, compare model responses, and vote on which one performs better. These votes generate a dynamic leaderboard that reflects real user preferences. Today, LMArena has over 3 million monthly users. A major spike in traffic occurred in August when a mysterious AI model named Nano Banana went viral for its impressive text-to-image generation and image editing capabilities. The model quickly rose to #1 on LMArena’s image generation leaderboard. It was later confirmed that Nano Banana was Google’s Gemini 2.5 Flash, a powerful multimodal model. Chiang highlighted that the platform’s strength lies in its focus on real-world use cases. While traditional benchmarks measure specific technical skills, LMArena captures how models perform in practical situations. For example, in coding, Claude ranks highest, while Gemini leads in creativity. On vision tasks, Gemini and the GPT series excel, and for image generation, Gemini continues to dominate. The platform also supports multimodal evaluation. LMArena recently launched WebDev, a benchmark that tests a model’s ability to build functional websites from prompts—helping developers prototype faster. Chiang emphasized the need for benchmarks grounded in actual work, not just technical performance. Big Tech companies like Google, Meta, and OpenAI use LMArena not just for exposure, but to gain valuable feedback. When they submit their models, they receive detailed reports on how their systems rank across different tasks. LMArena also shares anonymized data and tools with the public to foster transparency and collaboration. Chiang believes the future of AI lies in omni models—systems that unify multiple modalities like text, vision, and audio into a single, powerful framework. He sees this as a key trend, especially as Meta’s new Superintelligence Labs reportedly work on such a model. Despite a recent MIT study suggesting many companies aren’t seeing ROI from AI investments, Chiang remains optimistic. He argues that AI’s real value comes from practical applications—like helping doctors or lawyers save time. LMArena aims to bridge the gap by collecting data on how AI is used across industries, with plans to expand into law, medicine, and education. Ultimately, the platform’s goal is to make AI evaluation open, transparent, and driven by real users. By empowering communities to judge models based on real performance, LMArena helps shape the future of AI—not just in labs, but in everyday work.

LMArena CTO Wei-Lin Chiang on AI model battles, Google’s Nano Banana, and the future of real-world AI evaluation

Related Links