Colossal-AI Source Code Released in Cooperation with Moore Threads for Joint Industry-Leading Chinese Natural Language Processing (NLP) Model
We cooperated with Moore Threads, a GPU chip design startup, to develop a Chinese NLP model called MusaBert and to add functionality to Colossal-AI that accelerates training of this model. MusaBert surpasses humans in terms of semantic understanding and matching capabilities. We have already made the new Colossal-AI functionality available as open source software and will soon release the model to the public as well.
Conversational AI for Customer Service and Digital Humans
Currently, MusaBert, as a base model, has been applied to Moore Threads’s projects including conversational AI for intelligent customer service and digital humans, and has been successively implemented in downstream fields such as semantic similarity, emotion recognition, reading comprehension, and phonological recognition.
MusaBert Ranked as Industry-Leading Model
On December 16, the pre-trained language model MusaBert (MUSA is a general meta-computing architecture proposed by Moore Threads) which we co-developed with Moore Threads, rose to be among the TOP10 projects on CLUE. The Chinese Language Understanding Evaluation (CLUE) is an adaptation of GLUE and an influential leaderboard that ranks projects designed for Chinese language interpretation.
As one of the most renowned evaluation benchmarks in Chinese language comprehension, CLUE covers many semantic analysis and understanding subtasks such as text similarity, classification, natural language reasoning, reading comprehension, etc. Industrial and academic sources have used CLUE as verification and measurement when assessing pre-trained algorithmic capabilities. Entering the CLUE’s TOP10 list indicates that the joint research and development team of Moore Threads and Colossal-AI has reached an industry-leading level for Chinese pre-trained research.
MusaBert scored 82.889 and ranked 9th on the CLUE1.1 overall ranking list (excluding human scores), and achieved the first place in the AFQMC task with a score of 86.92. This demonstrates that Moore Threads AI even surpasses humans in terms of semantic understanding and matching capabilities.
(December 16, 2022, The latest results of the CLUE1.1 overall leaderboard)
Compared to other Chinese pre-trained models on the TOP10 CLUE list, MusaBert contains only 300 million parameters, and is a single model without any integration requirements. MusaSim, Moore Threads’s semantic embedding model based on MusaBert, managed to defeat many large-scale models during the AFQMC task and won the first place. This not only lays a solid foundation for deeper semantic research on topics like retrieval systems and classified dialogues, but also proves the outstanding capabilities of Moore Threads suitable for Chinese NLP and low-resource large model training.
Building Leading NLP Technology
Semantic understanding has always been an important goal of NLP technology. Through a series of AI algorithms, text can be parsed into structured, machine-readable intent and word slot information. Generally, the acquisition and processing of training data, the iteration and training of the model, etc. are all difficulties in the implementation of NLP technology. Using a minimal number of parameters, MusaBert can achieve comparable results or even surpass large models with tens of billions of parameters. This primarily benefits from the following aspects.
Consuming Massive Amounts of Data
In addition to having used high-quality semantic similarity data collected by Moore Threads itself, MusaBert has used 200GB of BAAI-WuDao open source data, 80GB of CLUE community data, as well as 1T high-quality data sets provided by Inspur Electronic Information Industry which enables the model to maintain a high performance level despite a relatively small scale.
Integrating Acceleration Hardware and Software
Moore Threads has the technical capability of "integrating software and hardware", enabling MusaBert to optimize from the bottom up. Moore Threads's multi-function GPU has built-in hardware modules such as AI acceleration and parallel computing, which can provide full stack functionality, including AI and scientific computing. This provides general-purpose, cost-effective, energy-saving, and environmentally-friendly AI capabilities for application scenarios such as AI inference computing acceleration and low-resource large model training.
Improved Usability and Performance with Colossal-AI
MusaBert uses the large-scale AI model development system, Colossal-AI. For upstream training, Colossal-AI improves the usability and powerful parallel training performance of the system. At the same time, MusaBert optimizes data loader for model data preprocessing to quickly process large-scale data under low-resource conditions. In terms of downstream tasks, Moore Threads explores and utilizes powerful features of pre-trained language models by adopting appropriate optimization modeling, data enhancement in the field, and advanced Adan optimizer during model training optimization. The semantic embedding model MusaSim, self-developed by Moore Threads, uses MusaBert as the backbone, contrastive learning method for training, and utilizes millions of pairs of supervised data collected by Moore Threads. Thanks to MusaBert and high-quality data sets, MusaSim not only surpasses many larger-scale models in semantic similarity tasks, but also achieves more accurate results in classification tasks such as intent recognition and sentiment analysis.
Open Sourcing the Code and the Model
To further make large model development and application more accessible, the code of MusaBert is open source on the Colossal-AI GitHub. With this, developers can train a high-quality Chinese Bert model in just a short time. A series of high-quality models including MusaBert and MusaSim will be open-sourced in the near future to contribute to the Chinese NLP community. After rigorous testing by Moore Threads and the Colossal-AI team, it is now possible to train MusaBert and even larger-scale GPT2 through Moore Threads’s multi-function single GPU, greatly reducing the cost of pre-training. This is also a solid step for both parties to realize the common vision of low-resource large-scale model training.
Ongoing Cooperation for Democratization of AI
In the future, Moore Threads will continue to work closely with us to do research on a more appropriate scale for large-scale natural language models, how to make full use of upstream data, and methods to produce more open source capable models. Both parties intend to maintain parallelism between the algorithm and the system, while continuously optimizing the training capability of large models on the Moore Threads multi-function GPU, especially in low-resource scenarios such as using a single consumer-level graphics card. This can greatly reduce the difficulties and cost of large model training, further promoting the democratization of AI.
About Moore Threads
Moore Threads is a high-tech company focusing on full-function GPU chip design, aiming to empower a wide range of technology ecosystem partners with computing power. Founded in October 2020, the company is committed to building a Metaverse computing platform that provides diversified computing power for the next generation of the Internet.