Pre-trained Large Language Models Learn Hidden Markov Models In-context

Yijia Dai, Zhaolin Gao, Yahya Satter, Sarah Dean, Jennifer J. Sun

발행일: 6/10/2025

Pre-trained Large Language Models Learn Hidden Markov Models In-context

초록

Hidden Markov Models (HMMs) are foundational tools for modeling sequentialdata with latent Markovian structure, yet fitting them to real-world dataremains computationally challenging. In this work, we show that pre-trainedlarge language models (LLMs) can effectively model data generated by HMMs viain-context learning (ICL)x2013their ability to infer patterns fromexamples within a prompt. On a diverse set of synthetic HMMs, LLMs achievepredictive accuracy approaching the theoretical optimum. We uncover novelscaling trends influenced by HMM properties, and offer theoretical conjecturesfor these empirical observations. We also provide practical guidelines forscientists on using ICL as a diagnostic tool for complex data. On real-worldanimal decision-making tasks, ICL achieves competitive performance with modelsdesigned by human experts. To our knowledge, this is the first demonstrationthat ICL can learn and predict HMM-generated sequencesx2013anadvance that deepens our understanding of in-context learning in LLMs andestablishes its potential as a powerful tool for uncovering hidden structure incomplex scientific data.

논문 세부 정보 보기 View Code