HyperAIHyperAI

Command Palette

Search for a command to run...

Gemma-2B, 12B-IT Utilize Three-Phase Factual Recall Circuit

Recent mechanistic interpretability research has successfully localized a consistent three-phase factual recall circuit within Google’s Gemma-2B and Gemma-12B-IT transformer models. The study applied activation patching techniques across 60 curated prompt pairs spanning 20 distinct knowledge categories to pinpoint exactly where and how large language models store and retrieve information. By calculating logit differences between clean and corrupted prompts and introducing a TotalSwing metric to rank signal strength, researchers isolated the causal pathways responsible for factual retrieval. The experimental framework leveraged TransformerLens to monitor residual streams, attention mechanisms, and MLP sublayers, enabling precise measurement of internal information flow. The analysis revealed a highly structured, scale-invariant retrieval process. The storage phase occurs in the early layers, with factual information encoded as directional vectors within the residual stream at the entity token position. In the 2B model, this manifests between layers zero and fourteen, while the 12B model extends it to layers zero through twenty-seven. The residual stream acts as the primary causal driver during this stage, significantly outperforming attention and MLP outputs. The routing phase involves distributed attention heads that gradually shift the encoded signal from the entity token to the final prediction position. No single attention head dominates this process; the workload is shared across the network, with individual head interventions yielding negligible causal impact compared to full residual stream modifications. The readout phase takes place in the late layers, where subsequent blocks function primarily as pass-through channels. The model retrieves rather than computes the answer, indicating that factual information is fully formed before reaching the output stage. Replicating the experiments across model architectures confirmed that this three-phase circuit scales proportionally, maintaining identical structural logic despite differences in parameter count. However, researchers noted that tokenizer-induced dataset drift can complicate cross-model comparisons. The Gemma-12B-IT tokenizer parsed certain physical units differently than expected, necessitating dynamic prompt pair validation before experimental deployment. Constrained by high-performance computing disk quotas, the team paused larger-scale testing on models like LLaMA-70B but outlined clear next steps. Future investigations will prioritize path patching to map directed causal edges, alongside sparse autoencoders to decode residual stream semantics. Cross-architecture replication and testing against diffusion language models remain critical objectives. Ultimately, this research establishes a foundational blueprint for transformer factual retrieval, providing engineers with precise intervention points for diagnosing and mitigating model failures.

Related Links