ChatGPT Tackles Ancient Greek Math Puzzle with On-the-Fly Reasoning, Surprising Researchers
A recent experiment by two education researchers explored how ChatGPT-4 responded to an ancient Greek math puzzle known as the “doubling the square,” a problem described by Plato in around 385 BCE and often regarded as one of the earliest recorded examples of mathematics education. The study, conducted by Dr Nadav Marco from the Hebrew University and David Yellin College of Education, and Professor Andreas Stylianides from the University of Cambridge, aimed to understand whether AI models like ChatGPT engage in genuine problem-solving or simply retrieve pre-existing knowledge. The researchers presented the problem to ChatGPT-4 in a way that mimicked Socrates’ teaching method, gradually guiding the model through the puzzle. They then introduced deliberate errors, contradictions, and new variations to test its adaptability. While large language models are trained on vast text datasets and typically generate responses by predicting word sequences, the researchers expected ChatGPT to simply recall Plato’s classical geometric solution—constructing a new square on the diagonal of the original. Instead, the model took an unexpected path. Rather than using geometry, it applied an algebraic method, which would have been unknown in Plato’s era. It persisted with this approach even when the researchers pointed out that the solution was an approximation and not exact. Only after expressing disappointment that the model could not deliver an elegant, precise geometric answer did it eventually produce the correct geometric solution. Interestingly, when asked directly about Plato’s original dialogue, ChatGPT demonstrated strong familiarity with the text. Had it been relying solely on memory, it would have immediately referenced the diagonal-based construction. Its failure to do so suggested it was not merely recalling but attempting to reason through the problem in real time. The researchers also tested a variant: doubling the area of a rectangle while preserving its proportions. Despite knowing the researchers preferred a geometric approach, ChatGPT continued to use algebra. When challenged, it made a notable error, incorrectly claiming that a geometric solution was impossible because a rectangle’s diagonal cannot be used to double its area—this is false, as the diagonal can indeed be used in such constructions. The team emphasizes that these results should not be taken as evidence that ChatGPT “thinks” like a human. It lacks consciousness, intention, or genuine understanding. However, the model’s behavior exhibited traits similar to a learner—testing hypotheses, adapting to feedback, and sometimes making human-like mistakes. Stylianides noted that AI-generated solutions cannot be assumed to be valid, just as students must learn to critically evaluate their own work. He stressed that understanding and assessing AI-generated reasoning must become part of mathematics education. Marco added that prompting AI to collaborate—such as saying, “Let’s explore this together”—leads to more meaningful interactions than simply demanding answers. The findings, published in the International Journal of Mathematical Education in Science and Technology, highlight the growing need for students and educators to develop new skills in engaging with AI as a dynamic, if fallible, partner in learning.
