Master Xianchao of Longquan Temple: Using AI to Identify, Segment and Translate Ancient Scriptures

Master Xianchao from Longquan Temple, the most powerful scientific research temple, has been studying the integration of artificial intelligence and ancient documents in recent years. At present, the "Tripitaka" team he has led has implemented technical practices such as AI automatic punctuation, literary and vernacular translation, and ancient text recognition.
Longquan Temple, located at the foot of Fenghuangling in the suburbs of Beijing, can be regarded as the Buddhist temple with the strongest scientific research capabilities in the country and even the world.
Based on a saying of Master Xuecheng "Buddhism is ancient, but Buddhists are modern", which has encouraged the monks in Longquan Temple to engage in scientific research and write code, combining Buddhism with new technologies, and popularizing and internationalizing the project. The results are continuous, and they have been frequently searched and continuously paid attention to by the outside world.
Recently, Master Xianchao of Longquan Temple participated in a domestic technology conference and shared the technical practices of using artificial intelligence to organize and proofread the Tripitaka.
The birth of Buddhist AI: making Buddhist scriptures easier to read
Master Xianchao was originally a master of condensed matter physics from the School of Physics at Peking University. He graduated from Peking University in 2007 and converted to Buddhism at Longquan Temple in 2008. Since then, he has been committed to the editing and revision of the Longquan Tripitaka and the study of Buddhist doctrines.
In 2016, AlphaGo's historic victory over Lee Sedol attracted Master Xianchao's attention to AI. Since then, he has begun to try to combine AI with the OCR technology and automatic punctuation that he is researching.

Buddha-native AI solves the pain points of ancient scriptures
The Tripitaka compiled and collated by Longquan Temple is a collection of Buddhist scriptures, also known as the All Sutras. During the more than 2,000 years of Chinese Buddhism, the Tripitaka has been translated, supplemented and revised by successive dynasties.
There are dozens of versions handed down to this day, the shortest of which has more than 5,000 words and the longest has more than 120 million words.

In 2012, Longquan Monastery started to compile the Tripitaka.It is planned to be completed in ten years. Because the traditional methods of sorting out ancient books mainly include version proofreading, collation, and punctuation.These steps can ensure that contemporary readers can understand obscure and unfamiliar scriptures as well as possible.
Three years later, Longquan Temple compiled and published the "Nanshan Eight Great Works"; the following year, the Longquan Temple's Sutra Office was established to explore the use of artificial intelligence technology and develop a single-word recognition engine based on deep learning;
In 2017, Longquan Temple established an artificial intelligence and information technology center, developed a whole column recognition engine that can identify various versions of the Tripitaka, and successfully digitized the Tripitaka version of the "Sixty-volume Avatamsaka Sutra".
Master Xianchao currently serves as the director of the Buddhist Canon Office and is responsible for the compilation of the Tripitaka.
Automatic Punctuation: OCR + Deep Learning
In order to lower the threshold for people to read ancient Chinese classics and improve the work efficiency of scholars, in recent years, Master Xianchao's team hasThe use of technologies including deep learning and OCR to change the traditional way of interpreting the Tripitaka has achieved quite amazing results.

Master Xianchao introduced,Automatic punctuation refers to the technology of automatically marking modern Chinese punctuation for ancient texts based on algorithms without human intervention.This is mainly for the convenience of modern readers.
Previously, there have been related studies on the use of artificial intelligence to add punctuation to ancient Chinese texts. However, Master Xianchao said that before, it was basically just adding periods to ancient Chinese texts. He believes that this approach is "more conservative and more academic."
His team applied deep learning to automatic punctuation.You can add punctuation marks such as period, comma, question mark, exclamation mark, colon, semicolon and colon to ancient texts with higher accuracy.After verification, the Transformer labeling results they developed are "almost indistinguishable" from human labeling results.
RNN+LSTM+ResNet has improved the overall effect
Automatic punctuation, in the field of NLP, is a simple sequence labeling problem. The standard approach to solving this type of problem is to use a recurrent neural network (RNN).
In order to enhance the performance of RNN, a bidirectional RNN was developed on this basis, that is, the output at each moment depends not only on all the inputs at the previous moment, but also on the previous and next inputs. Later, Master Xianchao's team introduced the LSTM method.
However, the automatic punctuation achieved based on these technologies is still not very satisfactory. The reason why Master Xianchao's team achieved unexpected results is that they introduced the ResNet residual network on the basis of the previous ones.

Master Xianchao explained that previous neural networks had structures of at most a dozen or twenty layers. If the number of layers increased, the training results would not be easy to converge.The residual network has hundreds or even thousands of layers. A deeper network helps capture deeper semantic information, which is the key to its success.
The team also tried to use convolutional neural networks (CNNs). The final result was that the residual network had an average punctuation accuracy that was about 20-30% higher than that of the convolutional neural network.
How efficient is the AI automatic punctuation tool?Master Xianchao completed the punctuation of an ancient Chinese text of about 20,000 words in one day. According to the general remuneration level of 15 yuan per thousand words for punctuation of ancient texts, it is equivalent to creating 300 yuan of economic value in one day.Even if the accuracy of automatic punctuation is only calculated based on 60%, it still creates a value of 180 yuan per day.

At present, since the training data of Master Xianchao's team is mostly taken from Buddhist scriptures, its automatic punctuation is more suitable for punctuating Buddhist scriptures. However, he said,In the future, this technology will also be used in the compilation of ancient documents in more fields such as classics, history, and miscellaneous works, thus freeing scholars from mechanical and repetitive labor.
In the future, the working mode of ancient book proofreading is expected to be changed to: AI will first break sentences and add punctuation; and professional scholars will conduct later proofreading and revision.
Master Xianchao's team open-sourced this automatic punctuation online service in 2018.You can visit GuJiCool (http://gj.cool) for a trial and apply for free API calls.
Recognition and translation: AI becomes a treasure chest for the Chinese translation of Buddhist scriptures
In addition to automatic punctuation, Master Xianchao also applies AI to many aspects of ancient book research.
Literary and Vernacular Couplets: Alignment & Translation
Literary and vernacular couplets are the alignment and translation of ancient Chinese to modern Chinese. In order to realize AI literary and vernacular couplets, Master Xianchao first constructed a corpus of literary and vernacular alignment, and then designed an alignment algorithm, which achieved very good results.Based on the two independent indicators of similarity and difference, it is very easy to locate the misaligned sentences.

Since the Tripitaka contains many professional terms and the corpus of translated works from past dynasties is complex, it is not something that can be handled by professionals related to ancient Chinese. The total number of characters in the Tripitaka is in the hundreds of millions. If only a limited number of experts were relied upon, the workload would be enormous. Therefore, the intervention of AI has helped to relieve a lot of the workload for the experts.
OCR based on deep learning, recognizing ancient texts
Currently, the OCR software on the market is all for printed text, so it cannot recognize the fonts in ancient books and documents very well.
Venerable Xianchao and his team developed a new OCR engine based on the CNN+LSTM+CTC framework, and then trained it based on a dataset of more than 70,000 full images and 1.68 million text line images from the Tripitaka (Koryo Edition).

Ultimately, the OCR method they developed is capable of single-word recognition, single-column recognition, and semi-automatic multi-column recognition of ancient books, and can effectively complete the digitization of various types of ancient books.

Master Xianchao also posted on his WeChat public account "Xianchao Little Monk" (WeChat ID: xianchaofashi)In the post, I shared more project practices and insights into Buddhism. Friends who are interested can follow it.
Technology and Buddhism: Different externalizations of compassion
Buddhism and technology are not far apart.
We have alsoIn this century, Buddha sent robots to spread BuddhismIn an article, the trend of integration of Buddhism and technology was reported. The Xian'er robot, machine Guanyin, smart Buddhist beads, etc. that have emerged in recent years have long shown that technology has been deeply and harmoniously integrated into Buddhism.

Master Xianxin, another well-known monk of Longquan Temple and founder of the IT Meditation Camp, was asked about the relationship between Buddhism and technology in an interview.
He replied:Technology is the pursuit of the truth of the material world, while Buddhism is the pursuit of the truth of the inner world.Many people who have made scientific and technological explorations initially wanted to contribute to humanity, which is consistent with the Buddhist pursuit of the most compassionate. This is the commonality between science and technology and the Dharma.
References:
Xianchao Little Monk WeChat Account: "The Collision and Integration of Artificial Intelligence and Chinese Civilization"
2050 Yunqi Conference: "Master Xiandu - Technological Practice of Longquan Temple"
Longquan Temple automatic punctuation tool:http://gj.cool/gjcool/index