A Hu San Family's Artificial Intelligence Is Here

Microsoft has released an AI model for playing mahjong, which has successfully reached the highest level on a professional competitive platform. In this popular entertainment activity across the country and even the world, what difficulties did the AI Mahjong God overcome, and what is the deep meaning of the birth of this technology?
At the World Artificial Intelligence Conference held recently, Microsoft released an "AI Mahjong God" - Suphx, which surpassed the average level of top human players on a professional mahjong competition platform.
Suphx, the full name of which is Super Phoenix, was launched on Japan's professional mahjong competition platform "Tenho" in March 2019.

On this most well-known mahjong platform, in the open competition "Special Room" where AI can participate, Suphx played more than 5,000 four-player mahjong games with human players, gradually demonstrating his own strength and level.
By June, Suphx had reached the highest rank of the special room, the tenth rank. What prevented Suphx from reaching the eleventh rank, the "Tianfeng rank", was that the platform did not allow the AI system to enter the highest-level room for battles.
Since the launch of the Tianfeng platform in 2006, there have been about 180 players who have reached the 10th dan in four-player mahjong, while there are only a dozen active 10th dan human players. However, in terms of stable dan, which measures the level of strength, Suphx has reached 8.7 dan, far higher than the 7.4 dan of human 10th dan players.

Previously, there were two other mahjong AI systems active on the Tianfeng platform, namely "Blast" released by the University of Tokyo in 2015, and "NAGA25" released by Dwango in 2018, but the stable ranks of both were lower than 6.5, and they were far behind Suphx.
Mahjong with a thousand-year history: a slowly evolving popular leisure activity
Mahjong, also known as "Mahjong" or "Sparrow Cards", is an authentic Chinese game.
There are many different stories about the origin of mahjong, and the truth is impossible to verify, but what is certain is that since its advent, mahjong has been popular among the people as a national entertainment project and has remained enduring.

The symbols and production of mahjong tiles have also undergone many changes. The earliest mahjong tilesBamboo and animal bonesProduction, and there was also a period of playing cards afterwards.
Among the dignitaries, rhinoceros horns, ivory, gold, silver, copper, and blue and white porcelain were used to make mahjong tiles. At that time, mahjong tiles were carved one by one by skilled craftsmen.

It was not until after 1960 that the popularization of plastic products and the development of mechanization allowed mahjong to be mass-produced as a material.
But apart from changes in production techniques, the most advanced technology in mahjong, besides AI, may be the automatic mahjong machine.
AI wins by reasoning
Before AI research, many people believed that luck was the decisive factor in mahjong. But in fact, the rules of mahjong competition are actually very complicated.
136 Mahjong tilesThere are many possible outcomes of permutations and combinationsIn between two plays of a card by the same player, there are plays by the other three players, as well as his own card draw. In addition, “chi”, “peng” and “gang” will cause dynamic changes in the game.
Secondly, this is aImperfect Information ProblemEach player can only know his or her own 13 cards and the cards that have been played, while other people's cards and the remaining hole cards are unknown. This hidden information leads to many variables.

Even for experienced players, it is difficult toClarify the logical relationship between known cards and the best play, the rich hidden information will increase the complexity of the game.
This requires good strategic planning throughout the process. For example, when the situation is unfavorable, strategically "let the fourth player win" to prevent being overtaken by the second place in the total score.
Therefore, if you want to build an expert Mahjong AI, powerful computing power is not enough. What is more needed is to make the AI haveIntuition, prediction, reasoningandFuzzy Decision Makingability.
Becoming a great mahjong player through deep reinforcement learning
To address the above difficulties, Microsoft used deep reinforcement learning to create Suphx, which, through the latest algorithms, has gradually become the strongest mahjong player in competitive mahjong through learning and debugging.

The first is the "initialization" stage. Using the public data of the "Tianfeng" platform, researchersSupervised Learning, get an initial model, and based on the model, use self-game to conduct reinforcement learning training.
Subsequently, forImperfect Information GameIn order to meet the challenge, Suphx innovatively tried the prophet coaching technology to improve the effect of reinforcement learning.
During the training phase, invisible hidden information is used to guide the training direction of the AI model, making its learning path clearer and closer to the optimal path in the sense of perfect information, thereby prompting the AI model to deeply understand visible information and find effective strategies from it.

In addition, for the complex mahjong card expression and scoring mechanism, they use the overall prediction technology to build a bridge between each round of competition and the final result after 8 rounds.
Through ingenious designPredictor, the model can understand the impact of each round of games on the final result, thus havingA global decision-making perspective.
The research team also introduced a new mechanism that can dynamically control the course of the game, allowing Suphx to adjust its strategy based on the latest information during the reasoning phase and make adaptive decisions.
The last step is to enter actual combat, by constantly participating in games against human players, allowing AI to continuously learn and improve its skills.

Since entering the Tianfeng platform in March, Suphx has been constantly evolving. Currently, in terms of balancing attack and defense, Suphx is able to achieve smarter strategies than top human players, strategically balance short-term losses with long-term gains, and make quick decisions based on the available ambiguous information.
Mahjong AI: More than just winning or losing at the table
Thanks to its new algorithm and training techniques, Suphx is unique in its playing method and style.
The top human player on the Tianfeng platform praised Suphx on social media. He felt that he had watched many of Suphx's games and learned a lot of techniques that he had never seen before.
In addition, many players said that they learned practical fighting skills from playing against Suphx, and therefore called him the "mahjong textbook" and "Suphx teacher."

When it comes to winning or losing in mahjong, ordinary people enjoy the thrill of luck and experience, while masters enjoy the contest of intelligence.
Such a "Mahjong God" AI, in addition to creating an invincible mahjong coach, can also open up a new perspective, allowing us to analyze this entertainment from the dimension of data algorithms.
No longer relying on luck like a gambler, but under the aura of brain power, gradually abandoning those random and uncertain things and exploring a set of rules for victory.
Isn’t this the most fascinating ray of light on the road of AI development?

Content reference: Microsoft Research AI headline "Microsoft Super Mahjong AI Suphx, cracking imperfect information games" (https://mp.weixin.qq.com/s/S-axCx41WKDJG2BiGGTZfg)