AI weather models face physical limits in hurricane forecasts
A new study from Rice University highlights the rapid rise of artificial intelligence in weather forecasting while revealing critical physical limitations in current systems. Published in the Journal of Geophysical Research: Atmospheres, the research evaluates how AI models simulate tropical cyclones, comparing leading tools Pangu-Weather and Aurora against historical reanalysis data. While these AI systems can generate global forecasts in minutes—a stark contrast to the hours required by traditional physics-based models—the study finds they struggle to accurately reproduce the physical structure of storms, particularly wind patterns. Researchers led by Professor Avantika Gori tested the models using roughly 200 simulated storms from the North Atlantic and western North Pacific basins between 2020 and 2025. The analysis confirmed that AI tools excel at predicting storm tracks. Both systems consistently reproduced where storms traveled and where they made landfall, a capability that is vital for evacuation planning and early warnings. This success offers reassurance regarding the utility of AI for general hazard tracking. However, the models showed mixed performance regarding storm intensity. While earlier AI systems frequently underestimated the strength of major hurricanes, the newer Aurora model matched observed intensity distributions more closely than Pangu-Weather. Pangu-Weather still exhibited significant biases when simulating the most intense cyclones. The researchers also noted a complicating factor: the reference data used for comparison, ERA5, tends to underestimate peak intensities compared to direct observations, meaning agreement with this data does not automatically confirm real-world accuracy. The most significant finding involves the physical realism of the simulated wind fields. Although many AI-generated storms appeared visually convincing, detailed testing revealed violations of fundamental atmospheric physics. Specifically, tests for gradient wind balance, a key relationship governing mature cyclones, showed notable deviations near storm centers. Furthermore, both models tended to overestimate the size of the inner core in stronger storms. Since cyclone impacts depend heavily on how winds are organized internally, these structural inaccuracies can distort projections of wind damage, rainfall, and storm surge. The study underscores that while AI offers unprecedented speed and efficiency, it is not self-validating. Professor Gori emphasized that the lack of transparency in how these massive neural networks generate predictions makes systematic evaluation essential for high-consequence events. The findings suggest that AI forecasts should not replace human expertise but rather serve as a complement to it. Gori advised that forecasters can use the identified biases to apply necessary corrections. For example, if a model systematically underestimates intensity, human operators can adjust the output before issuing warnings. The broader conclusion is that advancing AI weather technology responsibly requires close collaboration between atmospheric scientists and AI developers. As these tools become central to forecasting, continuous scientific input and refinement are necessary to ensure that model outputs remain physically meaningful and reliable for public safety.
