INTEGRATED SEGMENTATION SYSTEM

FOR SIMPLIFIED AND TRADITIONAL CHINESE

Chinese Word Segmentation, or tokenisation, involves the process of breaking a string of Chinese characters into meaningful units (words). This process is non-trivial because there is no inherent word boundary in Chinese texts, and is a crucial step in processing Chinese language data.
 
On the basis of the analysis of massive Chinese texts involving over 500 million traditional and simplified characters since 1995, we have developed a highly accurate integrated Chinese word segmentation system for both kinds of texts and now offer it as a service.