Five Battles for CASP, a Benchmark for Protein Structure Prediction, Zheng Wei From Nankai University: Competitiveness and Difficulty Increase, Focus on Practical Biological Problems

"Before CASP14, many research groups saw DeepMind's participation and thought that their results might be similar to those of the last time (CASP13), so no one took it seriously."Professor Zheng Wei of Nankai University was studying and exchanging ideas in Professor Zhang Yang's laboratory at the University of Michigan at that time.He has participated in competitions with the team three times and witnessed the emergence of the first generation AlphaFold and the rise to fame of AlphaFold 2.
As the saying goes, "Outsiders watch the excitement, insiders watch the doorway." When the media was promoting AlphaFold's victory, the contestants competing with it in CASP13 were not as surprised and enthusiastic as the outside world. Zheng Wei recalled that at that time, AlphaFold still did not break away from the framework of "Distance prediction."Everyone agreed that "if we try, we may be able to surpass AlphaFold in a few months."At the same time, it is also felt that it is difficult for the industry to emerge with methodological innovations in the short term, and it has even entered a "bottleneck period."
For this reason, people did not initially have high expectations for DeepMind's performance in CASP14.
On the last day of November 2020, CASP14 announced the results. Zheng Wei and his team won the server group competition. When the organizing committee notified the results, it also brought another thought-provoking news:"One group's performance was outstanding, very different from the others and far surpassing other participating teams."He soon realized that DeepMind might have "come up with something big."
The results need no elaboration, AlphaFold 2 is excellent. "It really surprised us, they did a really good job," Zheng Wei and others analyzed at the time, "AlphaFold 2 has well integrated the excellent results and experience of the academic research team in the past, and invested more energy in model training to find the optimal solution. The performance of AlphaFold 2 is really eye-catching."
The subsequent CASP15 is known as a heavyweight event in the "post-AlphaFold 2 era". With the growing popularity of AI innovating protein structure prediction, the number of participating teams has increased significantly, and the competition has received more widespread attention. From basic research to applied research, from academia to industry, everyone is looking forward to more surprises.This is Zheng Wei's fourth time participating in CASP. He has experienced the transition from structure optimization to structure prediction and has accumulated rich experience.In the more intense competition, the DI-TASSER algorithm and DMFold-Multimer algorithm he developed won multiple championships in different competitions.

Comparison of AlphaFold 2 prediction results with experimental structures (true structures)
The year is 2024.CASP16 arrived as scheduled, and he, who had returned to Nankai University, led the team to participate again.The tracks and events he participated in became more extensive, and as AlphaFold 3 was open source, he still chose to "stick to the roots" and still took the lead in multiple tracks.
After the results were announced, HyperAI had the honor of conducting an in-depth interview with Professor Zheng Wei. Through this international competition, which is a bellwether event, he analyzed the current industry development trends for us and helped us summarize the growth path of AI for Science researchers based on his personal experience.
also,Professor Zheng Wei will also give a lecture at 19:00 on January 15.In the form of online live broadcast, they will share with us their achievements in depth - three-dimensional structure prediction of biological macromolecules and their interactions based on deep learning. Please make an appointment to watch!
Getting Started with CASP, From Optimization to Prediction
Zheng Wei received his undergraduate, master's and doctoral degrees from Nankai University. He initially studied information science at the School of Mathematics, but the school had already opened a course in bioinformatics at the time, and several teachers were also working on protein structure. So when he decided to shift from basic mathematical research to application, he chose this direction. "I first encountered the problem, and then the tool."During his master's degree, he began to focus on research in the field of protein structure. At that time, the application of AI in this field was far less extensive than it is today, so the tools he came into contact with were "relatively diverse", including statistical tools, traditional algorithms, machine learning, deep learning, and so on.
Like all graduates, he also hesitated and struggled when his master's degree was coming to an end - should he start preparing for employment or continue his doctoral studies? "During the exchange study at Keio University in Japan, I felt the rich academic atmosphere and determined to continue on the path of scientific research." Now thinking back, the two study exchanges in Japan and the United States had a profound impact on him.
In 2015, during the last two years of his doctoral studies, he went to the University of Michigan in the United States for joint training and grew rapidly in Professor Zhang Yang's laboratory.
"Thanks to Professor Zhang Yang for introducing me to the field of structural prediction."As mentioned above, Zheng Wei has participated in CASP three times with Professor Zhang Yang's laboratory. In this competition known as the "Olympic Competition of Protein Structure Prediction", he has accumulated a lot of practical experience.
A few months after arriving in the United States, he participated in the CASP12 protein structure optimization (refinement) competition. He was a rookie and his results were not ideal, but it was enough to make him see his interests clearly -If you can improve the accuracy of other people's predicted structures, why don't you just do protein structure prediction yourself?
"Based on such a simple logical idea, I decided to do structure prediction directly." So at CASP13, he followed Professor Zhang Yang and focused on structure prediction, starting with template matching and template retrieval. He then built an algorithm CEthreader based on template structure prediction, and collaborated with other team members to develop the algorithm server CI-TASSER, which won first place in the server group.
The success he showed in the industry competition also gave him a lot of confidence: "I think structural forecasting is not bad and there is something to be done, so I started to delve deeper into this area."
Looking back on the transition from structural optimization to structural prediction, Zheng Wei admitted, "There are challenges, but there are also commonalities."first,The methodologies of the two directions are two sets of systems, and it is impossible to learn from each other or directly transfer experience. Among them, "optimization" needs to face the difficulty of uneven quality of initial models, and there may be little room for improvement, and there may even be errors. "Prediction" starts from scratch, and the difficulty can be imagined.Secondly,Both are spatial coordinates anchored at the atomic level and have something in common in terms of spatial movement or transformation, so "it's not as difficult as imagined."
After deciding to delve deeper into the field of structure prediction, Zheng Wei participated in CASP14 and CASP15.In CASP15, we focused on two areas: protein monomers and protein complexes, and won the championship in protein complex prediction with a score far higher than that of other participating teams.

Comparison of AlphaFold 2 prediction results with experimental structures
Industry trend: Focus on solving practical problems
As an international competition held every two years, CASP, which was first held in 1994, has witnessed numerous important achievements in the industry in the past 30 years, and well reflects the development trend of the biological field. Professor Zheng Wei, who participated in CASP for the fifth time, said,The topic and competition format of CASP were not thought up by the organizing committee, but were the result of concentrated discussions among the members.We will also convene participating teams to listen to suggestions and understand the issues that are of concern to the industry at present.
There is no doubt that the teams participating in this high-level competition are all senior experts and scholars who have been deeply engaged in the field for many years, and they all have unique insights in their respective research directions. As Zheng Wei said, "The directions proposed by everyone when they sit together to discuss may be the current hot topics in computational structural biology, or the directions that need to be solved urgently and are closely related to biology."
In other words,CASP has been providing advice and solutions to hot issues in the industry.
Looking back at CASP16 which ended not long ago,He believes that "the overall competitiveness and difficulty have increased compared to before."First of all, the number of participating teams has increased significantly compared with previous years. "This year should be the year with the largest number of participating teams since the competition was held, and they are mainly concentrated in the academic community. Many experienced old CASPers attended, so the overall competition is very fierce." At the same time, according to his observation, in recent years, more and more domestic teams have participated in the CASP competition and achieved good results. In addition, the proportion of Asian teams is also increasing. Among them, the Korean bioinformatics field has benefited from the "airborne employment" of several industry leaders, and the scale of participating teams has changed significantly.
Secondly, the increased difficulty of the competition, on the one hand, symbolizes the overall technological improvement in the field of protein structure prediction, and on the other hand, it also confirms that industry needs are clearer, so this competition "is more inclined to practical biological problems."
Talking about the trend of CASP's increasing difficulty and wider types of questions, Professor Zheng Wei analyzed that there are two main reasons.On the one hand, the accuracy of protein prediction in academia and industry is constantly improving.Between 2015 and 2020, the accuracy of protein monomer structure prediction has been rising rapidly, and the academic community has achieved fruitful results, "pushing the problem of monomer structure prediction very close to the limit." Especially after the launch of AlphaFold 2, the advantages of the industry have been combined, and the model capabilities are stronger, thus raising the accuracy to a new level.
It can be said that "it is difficult to improve the prediction accuracy of protein monomers in some areas, so everyone has begun to turn to the research of other new problems, such as protein complexes, protein conformations, etc.", and this trend is also directly reflected in the competition questions, but there is less preliminary research in new fields, so it may lead to everyone feeling that the difficulty of the questions has increased.
On the other hand, in the past ten or so competitions, the topics given by the organizing committee have already contained some biological information and background, "which is actually a bit out of touch with actual biological problems", and the participating teams have generally completed this type of topic very well. Take the prediction of protein complex structure as an example, which contains two proteins A and B. In previous competitions, the content and proportion of A and B would be made public to reduce the difficulty of structure prediction, but in actual applications, it is obviously impossible to clearly know this information.Therefore, this competition has changed the previous topic setting, closer to the actual situation, and requires participating teams to predict the complete structure from scratch.
This actually gave contestants including Zheng Wei a "surprise". He introduced that the organizing committee notified on Wednesday that 0-base questions would be released starting from the next week, and there were only 5 days to prepare a new pipeline for the competition. The team worked "day and night, without sleep" to develop a small algorithm, and contacted a familiar "advisory team" of biologists to help us with inference and calibration in combination with biological literature.
In addition, CASP16 has added macromolecule multiple conformation prediction (ENSEMBLES) to the original tracks of protein monomer structure prediction (REGULAR), protein complex structure prediction (MULTIMER), accuracy assessment (EMA), nucleic acid structure prediction (RNA), and ligand complex structure prediction (LIGAND). There are many sub-projects in these six major tracks, and there are also overlaps between them.
Despite this, Zheng Wei still achieved outstanding results. He and his team not only participated in 5 tracks except small molecule binding,At the same time, we have built separate algorithms for different tracks to cope with the challenges.It ranks second in the protein monomer single domain group, first in the nucleic acid polymer server group (z-score>-2.0), first in the estimation of the overall folding accuracy of the complex, first in the prediction of protein-nucleic acid complexes, and first in the TM-score for multi-conformation prediction.
It is worth noting that on May 8, a week after the start of CASP16, the AlphaFold 3 server was launched, and then some participating teams began to try to use it to replace their own algorithms. "We were quite confident at the time, so we didn't use AlphaFold 3 much," he said after discussing with the team.They decided to "stick to the roots" and became the only team in the top ranking of protein complex structure prediction that did not use AlphaFold 3.

Photo with John Moult, Chairman of CASP Organizing Committee
When talking about it now, Professor Zheng Wei laughed and said, "At the moment, we may be a little too confident," but in my opinion, under the high-pressure competitive environment at the time, facing the situation where competitors all had "stacked buffs", making such a decision and achieving such results did require courage and strength.
AI4S Introduction and Accumulation
Indeed, Zheng Wei's team's pursuit of technical support from biologists in CASP16 is also a common collaboration model in the field of AI for Science.
The so-called AI for Science aims to use the advantages of AI to tackle difficult problems in traditional scientific research fields, or to improve efficiency and accuracy. It is necessary to understand the needs and pain points of the research field and master AI technology. Obviously, talents with the above-mentioned interdisciplinary backgrounds are rare, and under this general trend, many researchers have also begun to self-study AI. Similarly, researchers focusing on AI or computing are also increasingly turning their attention to fields such as biomedicine, materials chemistry, and geographic information science.
Professor Zheng Wei, who started from the information field, shared,The characteristic of bioinformatics is that "it is easy to get started, but the accumulation process in the middle is actually quite long. The explosion after accumulation is also relatively fast, but it will soon enter a bottleneck period, and further improvement may require a long period of accumulation."

Specifically, protein-related research actually has relatively loose requirements for biological background. "It may take a month or so to get started if you understand 20 kinds of amino acids." Then, it is time to develop and apply algorithms for practical problems. This process requires a certain amount of time to accumulate.He recalled: "Throughout my master's studies, I was accumulating basic work on algorithms."
Protein structure prediction became his breakthrough point to overcome the bottleneck period. It was during his exchange and study in Professor Zhang Yang's laboratory at the University of Michigan that he began to further delve into related research such as AI-assisted protein structure prediction."It was during my postdoctoral period that I accumulated some experience in this area and gradually produced results."
As research deepens and the scope of coverage expands, it is difficult to cover all aspects of the problem from a one-way "computational" perspective, and sometimes even encounters difficult situations. At the same time, AI algorithms or models also need to be tempered by actual biological problems in order to be better iterated and upgraded, and should not be developed in isolation. In this regard, Zheng Wei is also constantly cooperating and communicating with biologists and related teams and institutions.
Interestingly, he has been instilling in the biology team——Don’t expect too much from AI, because our error rate may be quite high.It is with such awe for scientific research that, when faced with practical problems that can be implemented, he pays great attention to "combining the dry and the wet, with both sides providing information to each other and complementing each other, so that the implementation can be more solid."
Teamwork and multi-field development
In this interview, Professor Zheng Wei shared with us his experience and industry observations in CASP. Following his journey from CASP12 to CASP16, the author seemed to see his transformation over the years, from the initial ignorant choice of structure optimization, to resolutely turning to structure prediction, and then to discovering the fun in it, and then completing one accumulation and breakthrough after another.
Today, more teachers from Nankai University have joined the CASP team. Professor Zheng Wei said, "We need more tracks, or more directions for everyone to work together and cooperate based on the interests of the team." For this reason, in CASP16, the team did not focus on the traditional advantageous tracks, but focused on the entire field for decentralized development. "There are gains and losses. Overall, our results may not be as good as CASP15, but the team has learned experience." This is also the only way in the "accumulation curve". Let us look forward to a greater explosion of the Nankai University team in CASP and the entire field of bioinformatics!
New member recruitment
The bioinformatics team of the School of Statistics and Data Science at Nankai University, where Professor Zheng Wei is located, is recruiting new members!
If you are interested in computational structural biology, bioinformatics, or data science, whether you are a master's, doctoral, or postdoctoral fellow, you are very welcome to join Professor Zheng Wei's team.
Interested students can contact Professor Zheng Wei via the following methods:
* Email: jlspzw@nankai.edu.cn
* WeChat: 18622152765
Looking forward to your joining us to explore the mysteries of science together!