Microsoft Mahjong AI Paper Released, Revealing Technical Details for the First Time

6 years ago

Remember the "Que Shen AI" Suphx released by Microsoft in August last year? Recently, the research team published an updated paper on arXiv, further introducing the technology behind Suphx.

On August 29, 2019, Microsoft released a mahjong AI called Suphx (Super Phoenix). On a professional mahjong competition platform, Suphx's strength surpassed the average level of top human players.

Once released, Suphx attracted widespread attention, not only in the field of artificial intelligence, but also from many mahjong enthusiasts who came to watch and discuss it.(You can click this article to review "The Artificial Intelligence of the Hu Family Is Coming")

**The number and average size of information sets of Mahjong exceed those of Bridge, Texas Hold'em, and Go.**

People say that the system is more complex than AlphaGo, which defeated professional Go players, and is hailed as the "strongest Japanese Mahjong artificial intelligence."

Today, the system's development team published a paper on arXiv Suphx: Mastering Mahjong with Deep Reinforcement Learning, which explains the technology behind Suphx in more depth.

**Suphx: Mastering Mahjong with Deep Reinforcement Learning**
**Paper address: https://arxiv.org/pdf/2003.13590.pdf**

Suphx is getting stronger and stronger: he has surpassed 99.99% players

As we have previously introduced, the Suphx system uses deep reinforcement learning to learn from 5,000 games and gain experience, and then defeats many mahjong players on the Japanese professional mahjong competition platform "Tenho".Obtained the highest level of ten on the platform "Te Shang Fang".

**Suphx's rank on the Tianfeng platform is much higher than other Mahjong AI**

How was such a powerful Mahjong AI created? A research team from Microsoft Research Asia, Kyoto University, University of Science and Technology of China, Tsinghua University, and Nankai University gave an in-depth introduction in the latest version of the paper.

From the paper, we also learned that Suphx has improved his skills with further learning. On the "Tianfeng" platform with more than 350,000 players,Officially rated as surpassing players above 99.99%, this is the first time a computer program has surpassed most of the top human players in mahjong.

Five major models and reinforcement learning create Queshen AI

Suphx contains a series of convolutional neural networks,It learns five models to handle different scenarios.Including discard model, Riichi model, chow model, Pong model and Kong model.

**The discard model (top) and the architecture of the other four models (bottom)**

On this basis, Suphx adopts anotherRule-based models,To decide whether to declare a winner and proceed to the next round, check whether the winning hand can be judged from the cards discarded by other players, or from the cards drawn from the wall.

It is reported that the training process of Suphx is divided into three steps.

First, its five models are trained using logs of top human players collected from the Tianfeng platform.

The system is then fine-tuned through self-play reinforcement learning using a CPU-based mahjong simulator and a GPU-based trajectory generation inference engine.

Finally, during online games, runtime policy tuning is used to observe the outcome of the current round and thus make the system perform better.

**Distributed reinforcement learning system in Suphx**

Since the opponent's information is unknown in the Mahjong game, Suphx triedProphet coaching technology to improve the effectiveness of reinforcement learning.During the self-game training phase, hidden information is used to guide the model training direction, thereby enhancing the AI model's understanding of visible information and finding effective decision-making basis.

Evaluation: 5760 matches, 10 records

Prior to the experiments, the team trained each model for two days using 1.5 million hands on 44 GPUs (including four Nvidia Titan XPs for parameter servers and 40 K80s for self-playing players).

The team evaluated Suphx on 20 Nvidia Tesla K80 GPUs. To reduce the variance of the stable ranking, they randomly selected 800,000 Mahjong games from a dataset of more than 1 million Mahjong games and sampled them 1,000 times.

The evaluation results show that on the "Tianfeng" platform, compared with human playersAfter playing more than 5760 games, Suphx set a record of ten sections——Only about 180 players have ever reached this level. The stable ranking is 8.74(The highest level of human players is 7.4).

**Reinforcement learning agent final stable ranking statistics**
**Through continuous optimization, RL-2 finally achieved better performance**

Interestingly, the researchers wrote that Suphx's defense was "very strong," with a low probability of 10.06%, and it developed its own playing style that allowed it to keep its cards safe and win with a half-deuce.

**AI players (South) will choose to play conservatively**
**Give up the six-pole in the basket because it is already on the table**

In addition, the co-authors of the paper wrote that most real-world problems such as financial market forecasting and logistics optimization share characteristics with Mahjong, such as complex operation/reward rules, imperfect information problems, etc.

The author believes that the Mahjong technology designed in Suphx, including global reward prediction, prophet guidance, and policy adjustment, has great potential and can be widely used in the real world in the future to help solve real and complex practical problems.

After reading this, are you eager to try it? Tianfeng Mahjong Battle Platform:https://tenhou.net/, let’s play a game together!

-- over--

Microsoft Mahjong AI Paper Released, Revealing Technical Details for the First Time

6 years ago

Big Factory News

Reinforcement Learning

Microsoft

Remember the "Que Shen AI" Suphx released by Microsoft in August last year? Recently, the research team published an updated paper on arXiv, further introducing the technology behind Suphx.

On August 29, 2019, Microsoft released a mahjong AI called Suphx (Super Phoenix). On a professional mahjong competition platform, Suphx's strength surpassed the average level of top human players.

People say that the system is more complex than AlphaGo, which defeated professional Go players, and is hailed as the "strongest Japanese Mahjong artificial intelligence."

Today, the system's development team published a paper on arXiv Suphx: Mastering Mahjong with Deep Reinforcement Learning, which explains the technology behind Suphx in more depth.

Suphx is getting stronger and stronger: he has surpassed 99.99% players

Five major models and reinforcement learning create Queshen AI

Suphx contains a series of convolutional neural networks,It learns five models to handle different scenarios.Including discard model, Riichi model, chow model, Pong model and Kong model.

It is reported that the training process of Suphx is divided into three steps.

First, its five models are trained using logs of top human players collected from the Tianfeng platform.

The system is then fine-tuned through self-play reinforcement learning using a CPU-based mahjong simulator and a GPU-based trajectory generation inference engine.

Finally, during online games, runtime policy tuning is used to observe the outcome of the current round and thus make the system perform better.

Evaluation: 5760 matches, 10 records

Prior to the experiments, the team trained each model for two days using 1.5 million hands on 44 GPUs (including four Nvidia Titan XPs for parameter servers and 40 K80s for self-playing players).

After reading this, are you eager to try it? Tianfeng Mahjong Battle Platform:https://tenhou.net/, let’s play a game together!

-- over--

Command Palette

Microsoft Mahjong AI Paper Released, Revealing Technical Details for the First Time

Suphx is getting stronger and stronger: he has surpassed 99.99% players

Five major models and reinforcement learning create Queshen AI

Evaluation: 5760 matches, 10 records

Command Palette

Microsoft Mahjong AI Paper Released, Revealing Technical Details for the First Time

Suphx is getting stronger and stronger: he has surpassed 99.99% players

Five major models and reinforcement learning create Queshen AI

Evaluation: 5760 matches, 10 records

Related News

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Memory Usage Reduced by up to 751 Tp3T: Scientists at the U.S. Department of Energy Have Proposed a cross-channel Hierarchical Aggregation Method, D-CHAG, to Enable the Running of Extremely large-scale Model multi-channel datasets.

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Based on Billions of Genes From One Million Species, NVIDIA and Others Have Built the EDEN Series of Models, Achieving state-of-the-art (SOTA) Genome and Protein Prediction capabilities.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Broadcom's 72-year-old CEO, Who Built His Company on Acquisitions, Has Extended His Contract Until 2030, Aiming to Increase the company's AI Revenue to $120 billion.

Command Palette

Microsoft Mahjong AI Paper Released, Revealing Technical Details for the First Time

Suphx is getting stronger and stronger: he has surpassed 99.99% players

Five major models and reinforcement learning create Queshen AI

Evaluation: 5760 matches, 10 records

Related News

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Memory Usage Reduced by up to 751 Tp3T: Scientists at the U.S. Department of Energy Have Proposed a cross-channel Hierarchical Aggregation Method, D-CHAG, to Enable the Running of Extremely large-scale Model multi-channel datasets.

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Based on Billions of Genes From One Million Species, NVIDIA and Others Have Built the EDEN Series of Models, Achieving state-of-the-art (SOTA) Genome and Protein Prediction capabilities.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Broadcom's 72-year-old CEO, Who Built His Company on Acquisitions, Has Extended His Contract Until 2030, Aiming to Increase the company's AI Revenue to $120 billion.

Related News

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Memory Usage Reduced by up to 751 Tp3T: Scientists at the U.S. Department of Energy Have Proposed a cross-channel Hierarchical Aggregation Method, D-CHAG, to Enable the Running of Extremely large-scale Model multi-channel datasets.

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Based on Billions of Genes From One Million Species, NVIDIA and Others Have Built the EDEN Series of Models, Achieving state-of-the-art (SOTA) Genome and Protein Prediction capabilities.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Broadcom's 72-year-old CEO, Who Built His Company on Acquisitions, Has Extended His Contract Until 2030, Aiming to Increase the company's AI Revenue to $120 billion.

Related News

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Memory Usage Reduced by up to 751 Tp3T: Scientists at the U.S. Department of Energy Have Proposed a cross-channel Hierarchical Aggregation Method, D-CHAG, to Enable the Running of Extremely large-scale Model multi-channel datasets.

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Based on Billions of Genes From One Million Species, NVIDIA and Others Have Built the EDEN Series of Models, Achieving state-of-the-art (SOTA) Genome and Protein Prediction capabilities.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Broadcom's 72-year-old CEO, Who Built His Company on Acquisitions, Has Extended His Contract Until 2030, Aiming to Increase the company's AI Revenue to $120 billion.