RustGPT: A Fully Custom LLM Built in Rust from Scratch with No External ML Frameworks
RustGPT is a complete transformer-based large language model implemented entirely in Rust without relying on external machine learning frameworks. Built from the ground up using only the ndarray library for matrix operations, it demonstrates how to construct a functional LLM using pure Rust and fundamental linear algebra. The project includes a full training pipeline covering both pre-training on factual knowledge and instruction tuning for conversational behavior. It features interactive mode for testing, full backpropagation with gradient clipping, and a modular design that separates concerns across distinct components. Key implementation components include tokenization, embeddings, multi-head self-attention, feed-forward networks, layer normalization, output projection, and an Adam optimizer—all written in pure Rust. The model architecture consists of three transformer blocks, with an embedding dimension of 128, hidden dimension of 256, and a maximum sequence length of 80 tokens. Training occurs in two phases: first, the model learns general facts such as "The sun rises in the east" and "Water flows downhill due to gravity." Then, it undergoes instruction tuning to understand conversational patterns like responding to questions about mountain formation or rainfall. After training, users can interact with the model directly by entering prompts. For example, asking "How do mountains form?" yields a response like "Mountains are formed through tectonic forces or volcanism over long geological time periods." The implementation uses cross-entropy loss, Adam optimization with a learning rate of 0.0005 during pre-training and 0.0001 during instruction tuning, and gradient clipping with an L2 norm cap of 5.0. It supports greedy decoding for text generation and includes comprehensive unit tests for every module. The project is designed for educational purposes, offering a clear path to understanding how modern LLMs work under the hood. It avoids dependencies on PyTorch, TensorFlow, or Candle, relying solely on Rust and ndarray. Contributions are welcome. High-priority features include saving and loading model weights, performance improvements via SIMD or parallelization, and advanced sampling methods like top-k, top-p, and temperature scaling. Other areas for enhancement include better positional encodings, training visualizations, and attention analysis. To get started, clone the repository, run cargo run to train the model, and test it interactively. Use cargo test to run all tests or target specific components. For development, follow Rust conventions, write tests, and keep the implementation framework-agnostic. Ideal for learners, experimenters, and developers interested in low-level AI systems, RustGPT provides a hands-on way to explore transformer mechanics and LLM training in a fully self-contained Rust environment.
