China releases first AI large language model for ancient book research

(Global Times) 11:00, December 14, 2023

Illustration of the Xunzi artificial intelligence large language model Photo: njau.edu.cn

Illustration of the Xunzi artificial intelligence large language model Photo: njau.edu.cn

A college research team from East China’s Jiangsu Province has recently released China’s first large language model (LLM), a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively big data sets to help conduct research on Chinese ancient books.

The LLM for ancient books was designed to intelligently process ancient texts, promote innovative development in the research and preservation of Chinese ancient books, enhance the efficiency and quality of the inheritance of traditional Chinese culture, and facilitate deep integration between LLMs and the processing of ancient books.

The LLM “Xunzi,” named after Xun Zi, one of the most famous philosophers in ancient China for his Confucian classic Xunzi, contains the vast majority of Chinese ancient books and documents including the collections of the “Complete Library in Four Sections” or “Siku Quanshu,” with a large-scale corpus of over 2 billion Chinese characters and words.

With the model, researchers can swiftly summarize the ancient texts and know about the themes of the ancient books. The model can also extract key information from the ancient texts, such as characters, events and places, to sort out the information with efficiency.

Besides, the model can also automatically generate ancient poems that comply with grammar and prosody rules with the prompts the users give to it to provide inspiration for poetry lovers. It can also precisely translate ancient texts into modern Chinese to help researchers understand the original meaning and connotation of ancient texts.

Led by Wang Dongbo, professor from College of Information Management of Nanjing Agricultural University in Nanjing, Jiangsu, the research team has been working in the area of digitization of ancient books and documents for a decade. Supported by the presence of the university’s strong computing power and based on the application scenarios provided by Zhonghua Book Company, the research team accomplished China’s first open-source LLM for ancient texts in AI.

The LLM has been published on websites such as github.com and modelscope.cn as open-source software, allowing users to download and use it for free.

(Web editor: Tian Yi, Liang Jun)


Related Stories