HyperAIHyperAI

Command Palette

Search for a command to run...

Stanford CS336 course: Language Modeling from Scratch

Stanford University has introduced CS336, a rigorous five-unit course designed to teach students how to build language models from scratch. With language models becoming the foundation of modern natural language processing, this class aims to provide engineers and scientists with a deep understanding of the entire development lifecycle. The curriculum mirrors operating systems courses that require building a system from the ground up, guiding students through data collection, cleaning, transformer architecture construction, model training, evaluation, and deployment. Unlike typical AI classes that offer extensive scaffolding, CS336 requires students to write code at a volume at least ten times greater than standard coursework. Proficiency in Python is mandatory, as the majority of assignments are implemented in this language. Students must also possess strong experience in deep learning and systems optimization, specifically regarding PyTorch, GPU performance, and memory hierarchy concepts. The course expects a solid grasp of calculus, linear algebra, probability, statistics, and machine learning fundamentals. The syllabus covers a wide range of technical topics over a 19-week schedule. Early weeks focus on tokenization, PyTorch operations, and resource accounting. Mid-term sessions delve into architectural designs, attention mechanisms, and hardware acceleration using GPUs and TPUs, including kernel programming with Triton. Later modules address scaling laws, parallelism strategies, inference optimization, and data processing techniques such as filtering and deduplication. The course also explores post-training methods like supervised fine-tuning and reinforcement learning from human feedback, concluding with advanced topics on alignment and multimodality. Assignments are implementation-heavy and require significant time investment. Students are expected to use cloud-based GPU providers for training runs if they are studying independently. To ensure academic integrity, the course adheres to a strict honor code. While study groups are permitted for discussion, each student must complete and submit their own work. The use of large language models for coding assistance is allowed only for low-level programming questions or high-level conceptual clarification; generating solutions directly via AI is prohibited. Furthermore, students are strongly encouraged to disable AI-based code autocomplete tools to foster deeper engagement with the material. Using existing code found online is generally forbidden unless explicitly permitted in the course handouts. The course is made possible in part through sponsorship from Modal, which provides compute resources. Key deadlines are distributed throughout the term, with assignments released and due weekly. The program concludes with guest lectures from industry experts and a final assignment. This intensive class is designed for those seeking to master the underlying mechanics of artificial intelligence rather than simply applying pre-built tools.

Related Links