HyperAI

OpenAI has quietly introduced support for a new feature called "Skills" in both ChatGPT and its open-source Codex CLI tool, following a similar approach pioneered by Anthropic. This update marks a significant step toward making AI systems more modular, extensible, and capable of handling complex, real-world tasks. In ChatGPT, the new functionality is integrated into the Code Interpreter. A new directory at /home/oai/skills is now accessible, and users can explore it by simply asking the model to create a zip file of that folder. The contents include pre-built skills for handling common file types like spreadsheets, DOCX, and PDFs. Notably, for PDFs, OpenAI’s approach involves converting each page into a PNG image and then using a vision-enabled GPT model to extract text and preserve layout, graphics, and formatting—ensuring richer understanding than simple text extraction would allow. The skills are structured as simple folders containing a SKILL.md file and optional scripts or resources, mirroring Anthropic’s open and lightweight design. This makes them easy to implement across different platforms. When prompted, for example, to generate a PDF summarizing the current state of rimu tree mast and its impact on kākāpō breeding season, the model accessed the relevant skill, conducted research, and produced a well-formatted, visually accurate document—complete with font adjustments to properly display Māori macrons in "kākāpō." The process took over eleven minutes, but the model was highly thorough, even checking the output and fixing font issues mid-process, demonstrating a level of self-reflection and quality control. Meanwhile, the Codex CLI tool has also added experimental support for skills through a recent pull request. With the --enable-skills flag, Codex can now recognize and use any folder in ~/.codex/skills that contains a SKILL.md file. Users can create their own skills, such as one for building Datasette plugins, and load them into the system. Once enabled, the model can be prompted to generate code, test it, and even serve it via a local web server. This approach enables a new kind of agentic behavior, where the model can not only reason and generate but also execute, test, and iterate on real code in a structured, reusable way. The result is a more powerful, hands-on AI experience. The rapid adoption of this model by OpenAI—just months after Anthropic’s initial release—underscores the potential of Skills as a foundational standard for next-generation AI tools. While the format remains informal, the simplicity and flexibility of the system make it a strong candidate for broader standardization. A formal specification, perhaps led by a new initiative like the Agentic AI Foundation, could help unify the ecosystem and unlock even greater interoperability across platforms.

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

OpenAI Quietly Integrates Skills into ChatGPT and Codex CLI for Enhanced AI Capabilities

Related Links

Command Palette

OpenAI Quietly Integrates Skills into ChatGPT and Codex CLI for Enhanced AI Capabilities

Related Links

Command Palette

OpenAI Quietly Integrates Skills into ChatGPT and Codex CLI for Enhanced AI Capabilities

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.