HyperAI超神经

A 2.5-year-old laptop can now run the GLM-4.5 Air model with MLX and generate a working Space Invaders game in JavaScript. On July 29, 2025, the new GLM-4.5 model family from Z.ai in China was introduced. These open-source models, licensed under MIT, are claimed to perform well in coding tasks, even outperforming models like Claude Sonnet 4 in some benchmarks. The GLM-4.5 Air model, while large, has 106 billion parameters and weighs 205.78GB on Hugging Face. Ivan Fioravanti created a 3-bit quantized version of the model for MLX, which is 44GB and designed to be run on machines with 64GB of RAM. I tested it on my 64GB MacBook Pro M2 and found it to work very well. I provided the model with the prompt: "Write an HTML and JavaScript page implementing space invaders." It generated a complete and functional implementation with no additional edits required. While the example isn't groundbreaking, it's impressive that a model running on a two-and-a-half-year-old laptop can produce such code. To run the model, I used the latest main branch of the mlx-lm library, which included support for GLM-4.5. I installed it via uv and then executed the standard MLX model run command: In the Python interpreter, I loaded the model with: from mlx_lm import load, generate model, tokenizer = load("mlx-community/GLM-4.5-Air-3bit") This downloaded the 44GB model weights to my ~/.cache/huggingface/hub/models--mlx-community--GLM-4.5-Air-3bit folder. Next, I set the prompt and used the tokenizer to format the input: prompt = "Write an HTML and JavaScript page implementing space invaders" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) Then, I generated the response: response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=8192) The response began with a detailed plan for the game, including elements like a player spaceship that can move and shoot, enemy invaders that move in formation and shoot back, a score tracker, a lives system, and game over conditions. The model used around 47.687GB of memory at peak, which left only 16GB available for other tasks. I had to close several applications to make space, but once running, the performance was smooth. I also tested the model with a different prompt: "Generate an SVG of a pelican riding a bicycle." The result was a creative and detailed image, and the model used approximately 48GB of RAM during the process. This demonstrates how far local coding models have come. In 2025, many models have been specifically optimized for coding tasks, and this focus is clearly delivering results. Just two years ago, when I first experimented with LLaMA, I couldn’t have imagined that the same laptop would one day run models with capabilities as strong as GLM-4.5 Air, Mistral 3.2 Small, Gemma 3, Qwen 3, and others that have emerged in the past six months.

A 2.5-year-old laptop now runs GLM-4.5 Air to code a working Space Invaders game in JavaScript

Related Links