HyperAI
Back to Headlines

Google's DeepMind and OpenAI's Models Achieve Gold Medal Performance in International Math Olympiad, Sparking Debate Over AI Transparency

11 days ago

Artificial intelligence models developed by Google’s DeepMind team and OpenAI have added a new achievement to their lists: they’ve matched the performance of top high school students in mathematics at this year’s International Mathematical Olympiad (IMO). This prestigious competition, known for its rigorous and challenging exams, attracts the brightest young minds from around the globe. Participants face two four-and-a-half-hour exams over two days, with each test consisting of three complex, multi-step problems. DeepMind and OpenAI's models demonstrated exceptional skill, solving five out of six problems perfectly and scoring 35 out of 42 possible points—an achievement that earns a gold medal. This score was on par with the performance of 67 human participants out of the 630 students who competed. However, there’s a notable difference in how the two companies approached the competition. DeepMind was officially invited to participate and announced its gold medal on Monday, in line with the IMO’s release of official student results. On the other hand, OpenAI did not enter the IMO but instead used the publicly available problems to create its own solutions. OpenAI then announced its gold-level performance over the weekend, which contradicted the IMO’s request for companies to wait until the official results were posted. Since OpenAI’s participation was unofficial, its claim cannot be independently verified by the IMO. The AI models tackled the IMO exam in the same manner as the students—each model was given 4.5 hours to complete each exam without the use of external tools or internet access. Interestingly, neither company employed specialized AI models for the task; instead, they used general-purpose AI, which marks a significant advancement in the capabilities of these models. Skepticism remains regarding these claims, particularly because publicly available models did significantly worse. When researchers tested the problems using Gemini 2.5 Pro, Grok-4, and OpenAI o4, none of these models scored higher than 13 points, falling short of the 19 points required for a bronze medal. This disparity highlights the advanced capabilities of the proprietary models developed by DeepMind and OpenAI, raising questions about why similar, highly refined models are not more widely accessible. Despite the controversy, two key takeaways emerge from this event. First, AI models in laboratory settings are becoming increasingly adept at tackling complex reasoning problems, suggesting significant progress in the field. Second, OpenAI’s decision to announce its results prematurely has drawn criticism, as it overshadowed the achievements of the young students who participated in good faith. This achievement underscores the rapid evolution of AI technology, but it also raises ethical and practical questions about the transparency and accessibility of cutting-edge models. As the field continues to advance, it will be crucial to address these concerns to ensure that AI benefits a broader range of users and applications.

Related Links