HyperAI

Nvidia Releases ChipNeMo, a Large Model Designed Specifically for Chips

a year ago
Information
zhaorui
特色图像

Nvidia has released a custom large language model, ChipNeMo, trained based on its own internal data to help engineers complete tasks related to chip design.

Here, Huang Renxun attended the annual meeting wearing a floral vest and holding a handkerchief.Sam Altman  Billions are being raised to build new AI chip companies.

The contrast between complacency and ambition is the true portrayal of the current AI chip field. In the era of computing power supremacy, Nvidia is almost choking everyone. Therefore, more and more AI chip start-ups have sprung up like mushrooms after rain, and some even shouted to benchmark and replace Nvidia. At the same time, major companies such as Microsoft and Google have also begun to develop their own chips more diligently.

Indeed, since entering the era of intelligence, chips have become the "Achilles' heel" of a number of technology giants, and the high threshold of the semiconductor industry also makes this technological siege difficult to be easily conquered. In addition to the production link that Huawei has already "demonstrated", semiconductor design is also extremely challenging. Especially when electronic chips are approaching the limits of Moore's Law and computing power requirements continue to rise, how to achieve higher performance on advanced processes has become an important challenge in AI chip design.

GH100 full GPU with 144 SMs

As shown in the picture above, under a microscope, an advanced chip like the NVIDIA H100 Tensor Core GPU looks like a carefully planned city, consisting of tens of billions of transistors, connected by "streets" that are 10,000 times thinner than a human hair. The construction of the entire city requires two years of collaboration among multiple engineering teams to complete.

Among them, various departments work together, some define the overall architecture of the chip, some are responsible for the design and layout of various ultra-small circuits, and some are responsible for testing. Each task requires specialized methods, software programs and computer languages, and the complexity can be seen, which is precisely the technical moat of chip manufacturers.

Interestingly, Nvidia, which has been draining the wallets of big companies with its most powerful AI chips, has also begun to think about using AI to make money more "easily". Not long ago,NVIDIA released a custom large language model, ChipNeMo, trained based on its own internal data.It can help engineers complete tasks related to chip design and is currently for internal use only.

This result has been included in arXiv, paper address:
https://arxiv.org/abs/2311.00176
Follow the official account and reply "ChipNeMo" to download the paper

Customizing LLM for chip design based on domain adaptation technology

NVIDIA researchers did not choose to directly deploy existing LLMs, but instead customized the base models (LLaMA2 with 7 billion parameters, 13 billion parameters, and 70 billion parameters) using NVIDIA NeMo based on domain adaptation technology.
Note: NVIDIA NeMo is an end-to-end cloud-native framework that allows flexible building, customization, and deployment of generative AI models, including training and inference frameworks, guardrail toolkits, data management tools, and pre-trained models.

ChipNeMo uses a variety of domain adaptation techniques to adapt LLMs to the chip design domain, including:
* custom tokenizers for chip design data * domain-adaptive continued pretraining using large amounts of domain data * supervised fine-tuning with domain-specific instructions * use of fine-tuned retrieval models * retrieval-augmented generation (RAG)

The researchers conducted field evaluations on ChipNeMo using three specific applications: engineering assistant chatbot, EDA script generation, and fault summary and analysis.

ChipNeMo training process

Among them, domain-specific tokenizers can improve the tokenization efficiency of specific terms by customizing rules. The researchers adjusted the ChipNeMo pre-trained tokenizer to adapt to the chip design dataset of this study, adding new tags only for domain-specific terms.

During domain adaptive pre-training (DAPT), researchers combined NVIDIA's internal chip design data with public data sets, collected, cleaned, and filtered them.The internal data training corpus has a total of 23.1 billion tokens.Covers design, verification, infrastructure, and related internal documentation.

When performing supervised fine-tuning with domain-specific instructions (SFT), the researchers used a public general chat command dataset to conduct multiple rounds of chats, and combined it with a small amount of domain-specific command datasets to perform SFT on the ChipNeMo base model to generate the ChipNeMo Chat model.

In addition, the researchers used Tevatron  The framework generated 3,000 domain-specific automatically generated samples and fine-tuned the e5 small unsupervised model to create the domain-adapted retrieval model of this study.

In order to solve the common "hallucination" problem of ChatBot,The researchers employed retrieval-augmented generation (RAG) to improve the quality of answers to domain-specific questions.

Specifically, RAG retrieves relevant paragraphs from the database and includes them in the prompt along with the question, allowing LLM to generate more accurate answers that are more fact-based. At the same time, the researchers found that the accuracy of retrieval can be significantly improved by fine-tuning the unsupervised pre-trained dense retrieval model with a moderate amount of domain-specific training data.

RAG Implementation Process


In addition, in addition to making large language models more adaptable to the chip design field, domain adaptation technology can also reduce model parameters by up to 5 times, thereby reducing inference costs.

It is worth mentioning thatAll models were trained using 128 A100 GPUs.The researchers estimated the cost of domain-adaptive pre-training for ChipNeMo, as shown in the table below. DAPT accounts for less than 1.5% of the total cost of pre-training the base model from scratch.

Custom model with 13 billion parameters surpasses LLaMA2

The researchers monitored and evaluated the actual performance of ChipNeMo in three chip design applications: Engineering Assistant Chatbot, EDA Script Generation, and Bug Summarization and Analysis.

first,The Engineering Assistant chatbot can help chip design engineers answer questions about architecture, design, verification, etc., preventing them from writing code based on wrong assumptions or debugging unfamiliar code, thereby improving productivity. In addition, the chatbot can also extract relevant knowledge from internal design documents, codes, other recorded data about the design, and technical communication traces (emails, company instant messaging, etc.) to help engineers improve their work efficiency.

Engineering Assistant Chatbot Example

Secondly,EDA scripting is an important part of the industrial chip design process. In the past, engineers needed to learn internal script libraries, consult tool documentation, and debug scripts, which took up a lot of time. Therefore, researchers generated two different types of scripts based on Tool1 (Python) and Tool2 (TCL) from natural language task descriptions. Engineers can query the model and run the generated code in the same interface, and can also see how many corrections are needed to get a runnable script.

Integration of LLM script generator with EDA tools
EDA Script Generator Example

third,For bug summary and analysis, the researchers used NVIDIA's internal bug database NVBugs and also built a domain-specific SFT dataset.

Bug Summary and Analysis Example

The researchers conducted a comparative evaluation of ChipNeMo's performance based on chip design knowledge, EDA scripts, bug analysis, circuit design, and MMLU (Mean Multi-Language Understanding).

The results show thatChipNeMo's performance improves with the parameter size of the base model, and ChipNeMo domain adaptive pre-training provides significant performance improvements over the base model. At the same time, the best ChipNeMo model outperforms GPT-3.5 on all benchmarks and outperforms GPT-4 on the design knowledge and bug benchmarks.

In addition, in the chip design task,The custom ChipNeMo model with only 13 billion parameters matches or exceeds the performance of larger general-purpose large language models (such as LLaMA2, which contains 70 billion parameters).

Designing chips with large models is nothing new

Currently, ChipNeMo is only for internal use, and because it uses Nvidia's internal data for training, it is unlikely to be open sourced in the future. Nevertheless, as a graphics card giant, Nvidia's move to optimize workflows with large language models is still quite inspiring for the industry.

on the one hand,The high threshold of chip design is not only reflected in technical barriers, but also in experience and cost. Every step from design to implementation and then to production may become a "overtaking point" in industry competition. The addition of large models can "learn from others' strengths" in a shorter period of time for some start-ups that started late and have insufficient experience, and can even be regarded as directly hiring an experienced engineer. However, this requires more open source data and model support.

on the other hand,While large models continue to amaze the world in the form of Chatbot, many companies want to develop large language models based on open source models that are more in line with their own industry characteristics and business attributes, but most of them back off because it is difficult to solve the high training costs, and they also have to consider the security of training data. This is also confirmed by NVIDIA. The 128 A100 GPUs used to train ChipNeMo are not easily available to all companies.

It is worth noting that ChipNeMo is not the first time that large models have been used in the chip field.

As early as May 2023,Researchers at the New York University Tandon School of Engineering have achieved the first time that artificial intelligence was used to design a microprocessor chip by "talking" to AI.

Paper link:
https://arxiv.org/abs/2305.13243
Follow the official account and reply "Chip-Chat" to download the paper

“I’m not a chip design expert at all,” said Hammond Pearce, a professor at New York University, in an interview. “This is the first chip I’ve ever designed. I think that’s one of the reasons why this is so impressive.”

Specifically, the researchers successfully used GPT-4 to design an 8-bit accumulator microprocessor through 124 conversations, which was manufactured via the Skywater 130nm shuttle.

The day after the research was published,The Institute of Computing Technology of the Chinese Academy of Sciences published ChipGPT on arXiv.The discussion has been heated again. Researchers said that ChipGPT is an attempt to explore the feasibility of automatically generating logic designs using natural language chip specifications and to use current LLMs to reduce the cost of hardware front-end design, which traditionally requires high expertise and manual labor.

Paper address:
https://arxiv.org/abs/2305.14019

The research conclusions show thatCompared with traditional agile methods, ChipChat can reduce the amount of code by 5.32-9.25 times. In the optimized area mode, the area reduction of ChipGPT can reach up to 47%, which is more than the original ChatGPT model.

In addition, optimizing chip design based on AI is not a new concept. In addition to NVIDIA, major manufacturers such as Google have also made plans. In 2021, the Google team published a paper titled "A graph placement methodology for fast chip design", which introduced a deep reinforcement learning solution for chip layout planning. NVIDIA also released PrefixRL, a circuit design method based on deep reinforcement learning, in 2022.

However, ChipNeMo has gone through a period of precipitation and is a customized model, so it is bound to have more advantages in terms of application fit and efficiency. In this era of AI chip involution, Nvidia, as the "volume king" far ahead, is still thinking about using AI to improve efficiency. Perhaps it is also feeling the pressure from the pursuers?

References:
https://blogs.nvidia.cn/2023/10/31/llm-semiconductors-chip-nemo
https://mp.weixin.qq.com/s/cRa-qAUTB2czlUcGb4YiDw
https://mp.weixin.qq.com/s/54BCR1wMoncvRYfaccNk3g