HyperAI

[OSCAR Open Source Industry Conference Sub-Forum] Where Is the Open Source Big Model Going?

特色图像

In two months, ChatGPT will celebrate its first anniversary. As a pioneering AI model, ChatGPT is like a stimulant that pierces the central nervous system of thousands of industries, setting off an unprecedented AI arms race around the world.

In the past year, we have seen GPT-3.5 evolve into the multimodal GPT-4, and Google PaLM, which claims to have 562 billion parameters, has moved towards PaLM 2, which has a smaller parameter scale and is more efficient. We have seen Meta open source its Llama large model, and spawned a number of lower-cost and smaller-scale models including Alpaca, Vicuna, Koala, and Falcon. Alpaca Family

In just a few months, the open source big model community has flourished and is showing signs of competing with closed source, even shocking Google and causing it to worry that it has "no moat."In July, Meta released Llama 2, an open source version that is free for commercial use and has performance comparable to GPT-3.5, which directly subverts the large model landscape.It killed some closed-source large models whose self-developed level was not as good as Llama 2.

Therefore, many people shouted that "the time for everyone to have a big model of Android is coming soon." But we should also see thatUnder the bright picture of the open source big model, there are also a series of challenges such as talent, organization, data, and commercial restrictions;Looking abroad, there are open source big models as powerful as Llama 2. When will domestic open source big models be able to keep pace with them? Where will the debate between open source and closed source lead big models in the end?

Author | Tower

Editor | Sanyang

On September 21, the China Academy of Information and Communications Technology and the China Communications Standards Association jointly organized 「2023 OSCAR Open Source Industry Conference」At the "Open Source Big Model" forum held in Beijing, jointly hosted by Segmentfault and HyperAl,Experts from the scientific research, industry and investment communities engaged in all-round and multi-angle exchanges and discussions on "Opportunities and Challenges in the Development of Open Source Big Models."

The three guests of this roundtable discussion are:Wang Wei, professor at the School of Data Science and Engineering of East China Normal University and director of Open Source Society; Sha Jian, senior technical expert of Ant Group; Xu Kaiyong, deputy general manager of a well-known investment institution; and the host is Wang Chenhan, founder and CEO of OpenBayes Bayesian Computing.

Roundtable Forum: "Opportunities and Challenges of Open Source Big Model Development"

From left to right:

Moderator: Founder and CEO of OpenBayes Bayesian Computing Wang Chenhan

Professor of the School of Data Science and Engineering, East China Normal University, Director of Open Source Society Wang Wei

Senior technical expert of Ant Group Sand Sword

Deputy General Manager of a Well-Known Investment Institution Xu Kaiyong

Click the link below to go directly to the forum ☟

https://www.bilibili.com/video/BV1oF411m7yc/?spm_id_from=333.999.0.0&vd_source=5e54209e1f8c68b7f1dc3df8aabf856c

Without violating the original intention, we have summarized the highlights of this conversation and listed them below. Now, please join us in listening to the wonderful insights of the experts.

Discussion on the latest progress of open source big models

Moderator: Wang Chenhan, founder and CEO of OpenBayes Bayesian Computing

Since the launch of ChatGPT last year, we have seen the release of GPT-4, the follow-up of Claude Anthropic, the rapid update of the Llama family in the open source community, the emergence of a number of localized model companies in China, and the competition in the open source community. These all show that the development of big models is quite fast. The three teachers present are from the scientific research, industry and investment circles. I would like to ask everyone to evaluate the current status and future development trends of the big model field from different perspectives. Q1: How big is the gap between the overall open source community and GPT-4? Is there a critical point where the total achievements of the open source community exceed the most advanced level of any commercial company?

Wang Wei:People often use open source and closed source as two opposing approaches.But I personally think that these two approaches actually represent different business strategies.Even open source companies need to invest huge resources. Lagging companies can catch up with the leaders through open source, and leading companies can also gain multi-faceted perspectives through open source.

From the perspective of long-term development, commercialization is very important. In addition to commercialization, if we want to expand the ecosystem or developer community in the short term, open source provides a huge advantage. Since Llama 2 was open sourced in 2023, it has not only attracted a large number of developers, but also attracted many professionals in tool chains, industries, and evaluations, which has caused some pressure on OpenAI.

The biggest benefit of open source is that it allows everyone to see what it wants to do and how it intends to do it.From the perspective of schools, open source provides a convenient research channel for university scholars, thereby generating valuable research results, which in turn promote the development of open source technology.So I always think open source is a good business strategy.

Sand Sword:From the perspective of the industry, large models have developed rapidly in the past two years. If there is no major technological wave or breakthrough,In fact, the model structure itself has tended to be unified.

For commercial companies, on the one hand, open source models are conducive to increasing their own influence and accelerating technological iteration; on the other hand, from the perspective of model effects, the model is more related to training data and training methods, and closed-source companies may have many unique features in these aspects.

But looking back at the long history,We always believe that any technological closure will not hinder the progress of history.There will eventually be no need for closed-source companies' technology to remain hidden.

Xu Kaiyong:In the investment community’s view,Open source will definitely catch up with closed source, but the specific time is difficult to estimate.But I personally think that open source may catch up with closed source in the next 2 to 3 years, because in terms of the model itself, closed source does not have much first-mover advantage.

First-mover advantages generally fall into two categories:For example, when chip manufacturing progresses from 7 nanometers to 3 nanometers, latecomers also need to follow a fixed path to develop; but the big model itself does not follow this path. The big model includes two important aspects:One is data, and the other is training methods.

Although the training methods are currently controlled by advanced closed-source companies like OpenAI, once the community finds a better solution, or employees of closed-source companies leave and join the open-source community, the open-source methods will be rapidly enhanced. Therefore, the large model itself does not have too many first-mover barriers, which is the first point.

The second point is the network effect.For example, group-buying apps have many merchants and users, so they have network effects. However, the big model itself does not have such characteristics, so closed-source big model companies do not have the barrier advantage of network effects.

Therefore, I think open source will definitely surpass closed source, but the timing will depend on the current status and progress.

It can be seen that there are two development directions in the field of open source big models in China. One is to follow the footsteps of international advanced big models, such as some people are carrying out localization work on Llama, and the other is leading teams like Baichuan Intelligence, which release their own Chinese big models. Overall, the big models in the Chinese field are booming, but from the perspective of data evaluation, there is still a certain gap between the activity of the Chinese community and the international community. Q2: In the eyes of experts, what is the current progress of open source in the field of Chinese large models? Is it catching up with closed source or making original contributions? What is the proportion of each? What is the open source atmosphere in China in the field of large models?

Wang Wei:We often subconsciously compare China's open source environment and atmosphere with those in the West.In fact, open source itself is a global phenomenon.It means that it can be accessed, disseminated and modified anywhere in the world.

Open source originated in Europe and the United States, and it has been decades since the foundations of Linux and Apache were established. In contrast, large open source conferences like OSCAR have only just begun to emerge in China, but even so, we have made great progress, as can be seen from the various achievements released at the branch venues every year. In addition, the importance of open source at the national level is also increasing day by day, and more and more people in China are contributing to global open source.

Furthermore,The Chinese model is a very unique innovation for the world.Because Chinese is a very unique and rich language with a wide range of users. Our Chinese big model is not a confrontation or competition, but a reflection of cultural diversity. There are also many multilingual evaluations and applications internationally, and we also have courses and projects such as International Chinese, which have wide application value under the big model.

If we want to evaluate the specific extent of the Chinese big model, I personally think we should look at the final application results. The reason why this round of AIGC triggered by ChatGPT is so popular is that it has significant advantages in the generation of text and graphics. If the Chinese big model can be implemented in better application scenarios such as education and international exchanges, then its influence and advancement can naturally be reflected.

Sand Sword:First of all, in terms of evaluation, there were multi-language evaluations in the early ChatGPT-4. In fact, it can be cross-language, but it may perform better in mainstream languages and worse in unpopular languages.

At this stage, the necessity for many Chinese institutions to develop Chinese models lies in:From the perspective of both the country and the enterprise, they all hope to master core technologies.Even compared with directly calling other people's services, the higher cost does not necessarily mean better results.

Secondly, from the perspective of the community, the atmosphere of the entire Chinese community, including the open source community, is indeed not as good as that of the West, but in fact, many Western foundations, including Apache and Linux Foundation, are now building Chinese branches. These well-known foreign foundations and Chinese branches are expected to drive the Chinese community. We actually hope to see influential local foundations develop their own communities.

Xu Kaiyong:I think there is still some gap between the Chinese big model and foreign big models. Foreign big models support multiple languages, while domestic development is slower. In addition, few students in China use the Chinese big language model to do homework, write essays or solve math problems, but this phenomenon is actually very common abroad.

This is partly because domestic large models sometimes make mistakes, and users tend to tease them more.So I think Chinese large models still have a long way to go, but Chinese has its own language characteristics, and there are still many opportunities for the survival and development of domestic large models.

Now, in addition to the big model itself, people are beginning to pay more and more attention to other projects in the entire big model construction ecosystem, including data sets, training methods, chip bases, chip cluster-related software, and inference-related software ecosystems. Q3: Teachers, are you paying attention to other tool components or commercial companies in the field of open source big models?

Wang Wei:In addition to ecology,I also focus on legal, regulatory and compliance related issues.

From the school's perspective, the social impact of an enterprise is more important than its development. Especially for big models, we often talk about issues such as governance, compliance, and ethics. Big models are no longer a simple technology that belongs to the industry alone. Everyone can use it to generate text and pictures. Under its huge influence, there are also potential security issues.

These questions will be mapped to data and technical tools.For example, the data quality, privacy, and security of your training model require not only the efforts of engineers, but also the strong support of professionals such as lawyers.We build this foundation together. On this basis, we will focus on the chip layer, software and other contents above.

On the technical level, I am more concerned about the basic tool chain.These tool chains may not have direct commercial value like commercial companies, and universities will have more opportunities to do these things. At present, many universities, such as Fudan, will build some basic software, which is also something that our country lacks at present. Although these tool chains themselves do not have much commercial value, they are core and key basic things. Therefore, from the perspective of the school, we pay more attention to these and the previously mentioned ethical compliance issues.

Sand Sword:I will talk about its ecology and impact on the upstream and downstream of large models from the perspective of software and hardware.

There are many interpretations of a big model. In the eyes of algorithm developers, a big model is an algorithm model that solves general tasks. From an engineering perspective, a big model is just big. The computing power, data, and number of parameters determine the upper limit of the model's capabilities.But now many studies have begun to focus on lightweighting large models, not because so many parameters are not needed, but because the software and hardware layers cannot keep up.

The entire historical cycle of software, hardware, and algorithms is actually a spiral, mutually reinforcing development process, but now the big model has thrown the ball out, and its software, especially hardware, costs are very high.

The biggest problem currently affecting the commercialization of large models is the cost of inference.Although the training cost is the first aspect, it is feasible even if the training is slow. After all, this process is offline, and a model with hundreds of billions of items can be produced after one month of training. However, reasoning is different. For example, if QPS is achieved in a few seconds or one minute, it will be too costly to provide it for free to more than one billion people in the country. If it is charged, users may be lost, which is the biggest problem.

Currently, all parties across the country are working hard to deal with these issues.From the hardware level,Due to the suppression by the United States, the cards we bought are basically castrated versions, and major companies are also exploring the application of domestic cards. From my observation, the various indicators of domestic card computing power are good.

Software:NVIDIA's GPU ecosystem-based software stack built on CUDA still has a strong moat, covering everything from the upper-level training framework to the underlying operator library, and is also a mountain that domestic hardware manufacturers have to climb. All major hardware companies are developing their own software stacks, using different strategies.

But from the perspective of users and developers,I still hope that they can adapt their own ecosystem more at the compiler level, and converge to the mainstream open source framework at the user level, so that users can only perceive the performance improvement.At present, this trend seems to be the mainstream, but major companies certainly cannot reach a complete consensus due to their own business strategies.

So from the perspective of software and hardware, we need some time and technological breakthroughs to catch up with current demand. This is a challenge, but also a great opportunity.

Xu Kaiyong:The investment community pays close attention to the upstream and downstream of the model and its related industries.

for exampleModel bottom layer, we will look at some opportunities in the infrastructure layer such as 3D networks and RDBMS; andApplication LayerWe will focus on opportunities in some vertical industries, such as automatic reading of financial reports and announcements, or automatic summarization in the financial industry, fault detection in the industrial sector, enterprises with exclusive databases, and startups providing private large models or small models distilled from large models to solve only one problem or a series of problems.

In addition to the upstream and downstream of big models, investors will also look at some new possibilities of big models or artificial intelligence.For example, I have also been looking at open source and quantum computing related fields recently, because traditional artificial intelligence includes large models, whose performance grows linearly with cost, but quantum computing grows exponentially.

In the past 10-12 years, there have been quite a few companies listed on the Nasdaq in the United States that were born based on open source technology (or based on the parent companies of listed companies), such as Apache and Mongo DB. It can be seen that these companies that invested in open source have achieved good business value and returns. However, in China, few Chinese companies have invested in the open source ecosystem, and even fewer have gone public or taken the lead. Q4: Teachers, has the business model of China's open source ecosystem really worked? Are there any successful business cases? If so, will the big model promote this trend? If not, do big models have the opportunity to become part of this trend?

Xu Kaiyong:There are basically no open source listed companies in China, but there are many open source listed companies abroad.I think the main difference between the open source listing results in China and the United States lies in the talent factor.The United States attracts global talent and has an open mindset and unique insights. Most of the initiators of open source projects come from Silicon Valley.

There are also many developers/opinion leaders in China who participate in open source. Although there has not yet been an open source listed company, I believe it is possible in the future, especially in the field of large models.

Currently, looking around the world, only China and the United States can make large models. The competition we face in China is more direct, but over the years we have also trained a large number of computer talents, and there are more and more open source participants.Therefore, there is still an opportunity to create a public company in the open source field.

Sand Sword:There does not seem to be any very successful open source projects listed in China, but there are definitely well-known open source projects and startups.

The atmosphere in Silicon Valley is indeed better. After all, it has been developing for many years. Moreover, foreign open source foundations and investment institutions have a good incubation and guidance mechanism for potential open source projects, including community collaboration and commercialization cultivation. Many excellent projects may not have grown up in a wild way. We still need a process of catching up. In addition, the country also needs to continue to invest in these aspects, including education.

In this booming industry in China,If a company wants to go public, first of all, it must have a relatively deep accumulation of technology, and secondly, it needs a business model that can stand the test of time.I found that many excellent open source projects have not figured out how to make money from their products, but this is actually the most important thing.

Wang Wei:I would like to make three points. The first is commercial success.I have always believed that commercial success has no necessary relationship with open source or not.At the commercial level, it depends more on whether you are needed by the market and whether you meet customer needs, while open source is now more of a publicity gimmick.

The second point is what does open source of large models mean.Model open source is different from software code open source.What developers and users can do with the model after it is open source is a new experience for us. Although open source models are a way to download and use them, they also bring some new problems, which are difficult to apply to the traditional definition or framework of open source. Therefore, how to build a community and ecosystem for the model is a brand new problem.

What I personally care most about is talent cultivation, and I believe open source is very conducive to talent cultivation.

First of all, it allows college students to access the most cutting-edge technology more quickly.After Llama came out, many universities immediately made some deployments, fine-tuned it, and added content related to their own fields, all of which benefited from open source.

Secondly, the open source collaborative model is more useful for student training than purely technical training.It greatly improves students' communication skills and teaches us how to work with partners in a competitive and cooperative relationship, which is exactly what Chinese students lack. China lacks mature open source projects like those abroad. On the one hand, it is due to language problems, and on the other hand, it may be related to Chinese habits - we are not good at expressing our opinions in public, but we need to express our opinions based on some facts in the community. Therefore, open source is very helpful for students to improve their abilities in this area.

I strongly encourage students to participate in open source projects and communities, especially open source projects in China. I also hope that more companies can provide more opportunities for students who actively contribute to the community.

Open source big model from the perspective of scientific research, industry and investment

The current big models have the ability to generate code and can even provide engineering architecture suggestions. People say that AI will replace many jobs in the future, especially in the computer field. The work paradigm may undergo some changes due to the emergence of big models. Q1: Professor Wang Wei, as an open source pioneer and academic leader at East China Normal University, what are your thoughts on the changes brought about by AI in the process of cultivating talents in the computer field? In the face of the increasingly powerful trend of big models in the future, what skills will you focus on in cultivating students and talents?

Wang Wei, professor of the School of Data Science and Engineering at East China Normal University and director of Open Source Society

Wang Wei:We are currently actively embracing open source. Many projects, subjects, and question-and-answer interactions in courses are implemented using GitHub repositories. Now that the big model is here, our attitude is the same.As long as it can be done with a large model, we encourage students to use the large model.We also encourage teachers to join these practices.

For computer science students and teachers, it is not enough to just use big models, but also to understand the principles behind them, so as to better develop some applications and tools. Big models will definitely replace some jobs and professions in the future, but just like the industrial revolution, although many workers lost their jobs, it also gave birth to more new industries and professions.Therefore, we tell students that there will definitely be more new industries and occupations emerging in the future, and they need to be prepared for this while they are in school, and preparation starts with embracing it.

Second, the role of entrepreneurs is also very important. You are organizations that create jobs.After the big model came out, new positions such as prompt engineer and tuning engineer also emerged. There will be more and more new positions in the future.These are opportunities created by entrepreneurs.

Ant has done a lot of work in the open source ecosystem, such as SOFA and a series of open source work on cloud-native middleware. These works can be said to provide a very good ecological foundation for the industry's entire cloud-native support. Q2: Mr. Sha Jian, can you introduce the future direction of Ant Financial in the field of open source big models? At the same time, as a technical expert, how do you evaluate the results of Ant Financial's work in the field of open source, and how valuable is this work to the entire Ant Financial company?

Sha Jian, senior technical expert of Ant Group

Sha Jian: Ant embraces open source.If the internal project is incubated well, the company will also encourage everyone to open source. The company does not have any commercial indicators for this, but is more for enhancing the influence of technology and building the image of a technology company.

The lineup in the field of AI or large models can be divided into several parts:

First of all, the most basic is infra, which is equivalent to a production tool, and the whole will be open source. Now the first step of AI training infrastructure can be fully released in DLRover, and the entire reasoning part, as well as GPU virtualization, GPU clusters, and AIDC will also be gradually released.

At the application layer, some officially announced large models may not be fully open due to data issues, but some vertical large models, such as CodeFuse, are also gradually being open sourced. Now many teams are gradually moving towards open source.

From our own DLRover project, we are also thinking about why we should open source? What are the potential use scenarios of this project?

Some cloud vendors like Alibaba Cloud and Baidu Cloud need to sell their own services, so they develop their own applications and hardware. However, there are still many organizations that have a lot of hardware and researchers, but lack a professional infra team to use these hardware efficiently, and this is exactly what DLRover hopes to help.This is equivalent to empowering them or providing them with a complete set of solutions that have been verified within Ant.This is a potential possibility; there is also the end user, such as individual developers who can run one of our components separately. So the audience is still relatively wide.

We want to build our project for these users, but we don’t have any goals for how to commercialize it or whether it can be commercialized in the future.

According to observations, there are very few Chinese RMB funds investing in open source projects. Previously, the main force in China that invested in open source was also some US dollar funds. As a representative RMB fund in China, the investment institution where Mr. Xu works has directly or indirectly invested in multiple AI chip/large model companies. Q3: As a fund with good exit performance, what are your thoughts on open source investment? Will you be positive about investing in open source projects in the future? What are the reasons?

Xu Kaiyong, deputy general manager of a well-known investment institution

Xu Kaiyong: Open source is a force that cannot be ignored in the software industry.Our company also has a presence in the open source field, and has invested in technology software Infra, database, data governance and other related companies. I personally have faith in information technology, software and open source. I have been a beneficiary and promoter of open source since I started writing code in college.

The management of our entire fund is also very open, strongly supports investment in the open source field, and continues to pay attention to and promote the discovery of high-quality open source projects. However, not all investment institutions are so open. Some investors do not quite understand open source and think that open source means free, which also increases the threshold for investing in open source.

In the past, open source was indeed mainly invested by US dollar funds, but now US dollar funds have faded from the mainstream.Then RMB funds must also take up the banner of open source software investment.

Future Outlook

Open source was written into the country's "14th Five-Year Plan" for the first time in 2021. With its excellent creative model of equality, openness, collaboration and sharing, it is continuing to become an important engine for promoting digital technology innovation, optimizing software production models, enabling the transformation and upgrading of traditional industries, and helping companies reduce costs and increase efficiency.

As a representative of cutting-edge emerging technologies, large models are in the exploratory stage of moving forward.The open source community can bring together the world's best talents and work together to accelerate the iteration, optimization, and implementation of large models, thereby promoting digital transformation and business success in all industries with high-quality products and services.

Overall, open source big models have unlimited opportunities, but also face many challenges.At present, domestic large models are rushing to enter the market. Who will stand out in the fierce battle of thousands of models?You are welcome to leave your opinions in the comments section.

This article was first published by HyperAIWeChat public platform~