HyperAIHyperAI

Command Palette

Search for a command to run...

a year ago

Small but Significant: On the Promise of Small Language Models for Accessible AIED

Yumou Wei Paulo Carvalho John Stamper

One-click Deployment of DePLM: Optimizing Proteins with Denoising Language Models (Few-Shot)

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)
Go to Notebook

Abstract

GPT has become nearly synonymous with large language models (LLMs), an increasingly popular term in AIED proceedings. A simple keyword-based search reveals that 61% of the 76 long and short papers presented at AIED 2024 describe novel solutions using LLMs to address some of the long-standing challenges in education, and 43% specifically mention GPT. Although LLMs pioneered by GPT create exciting opportunities to strengthen the impact of AI on education, we argue that the field's predominant focus on GPT and other resource-intensive LLMs (with more than 10B parameters) risks neglecting the potential impact that small language models (SLMs) can make in providing resource-constrained institutions with equitable and affordable access to high-quality AI tools. Supported by positive results on knowledge component (KC) discovery, a critical challenge in AIED, we demonstrate that SLMs such as Phi-2 can produce an effective solution without elaborate prompting strategies. Hence, we call for more attention to developing SLM-based AIED approaches.

One-sentence Summary

Demonstrating that the small language model Phi-2 effectively solves knowledge component discovery without elaborate prompting, the authors advocate for SLMs as a resource-efficient alternative to large language models to advance equitable access in AIED.

Key Contributions

  • This work introduces Phi-2, a small language model trained on curated textbook-quality data, which requires only 5.4 GB of memory to enable local inference on consumer-grade hardware for resource-constrained educational settings.
  • Empirical evaluations on GSM8K, HumanEval, MBPP, and MMLU demonstrate that Phi-2 matches or exceeds the performance of significantly larger architectures such as Llama-2 and Mistral across mathematical reasoning, coding, and broad academic knowledge tasks.
  • A knowledge component discovery algorithm is developed that leverages the model's direct token generation capabilities to outperform instructional experts and GPT-based baselines without relying on elaborate prompting strategies.

Introduction

The rapid integration of large language models into educational technology promises advanced AI-driven tutoring and assessment capabilities, yet their substantial computational requirements and reliance on third-party cloud APIs create significant barriers for underfunded institutions and raise critical student privacy concerns. This community-wide preference for resource-heavy architectures often ignores the practical constraints of classroom deployment, where limited budgets, modest hardware, and data sovereignty dictate technology adoption. The authors leverage small language models like Phi-2 to demonstrate that prioritizing data quality over parameter count yields highly capable tools that run efficiently on consumer-grade hardware. By repurposing Phi-2 as a probabilistic similarity engine for knowledge component discovery, they prove that smaller models can outperform both human experts and larger GPT systems while delivering a more accessible, affordable, and privacy-safe solution for educational settings.

Method

The authors leverage the intrinsic probabilistic capabilities of a language model to develop a novel approach for knowledge component (KC) discovery, moving beyond conventional text generation methods. Rather than relying on prompting large language models (LLMs) to generate KC labels directly, the method treats the language model as a "probability machine" that can estimate the likelihood of textual sequences. This allows the authors to define a measure of question similarity based on the concept of question congruity, which is mathematically equivalent to pointwise mutual information (PMI) between two questions. The core idea is that if the presence of one question increases the probability of another question appearing in a given context, the two questions are considered congruent and likely to share a common knowledge component.

To operationalize this, the authors use Phi-2, a small language model (SLM) tuned for educational applications, to compute the necessary probabilities for the congruity formula. The model is configured to use top-1 sampling, ensuring deterministic token selection at each step, which enables reliable estimation of conditional probabilities. By evaluating pairs of multiple-choice questions (MCQs), the framework calculates the congruity score, which reflects how strongly two questions are related in terms of their underlying KCs. This similarity measure is then fed into a clustering algorithm to group questions that are likely to share the same KC.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp