8. Improvement of Chinese-to-English MT of Pharmaceutical Texts
In our last blog, we discussed how Chilin offered English-Chinese parallel sentence pairs on the TAUS Data Marketplace. Our first offerings are related to the Pharmaceutical-Biotechnology domain.
In the English to Chinese case, we found that adding Chilin parallel sentence data to the base Google model increased the BLEU score by almost 5 points – within the Pharmaceutical-Biotechnology domain.
In this blog we look at using Chilin parallel sentence data to improve Chinese to English translation. Chilin’s parallel sentence data is based on published patent data. Patent applications must be very accurate to meet the standards of the national patent offices. For that reason, we feel that the data is good enough to be used for English to Chinese, and for Chinese to English. For the test, we used the same training data as in the English to Chinese test, but with Chinese set as the source language and English as the target.
Chilin conducted a test using the parallel sentence data and the AutoML Translation tool on the Google Cloud Platform. The study was based on:
8,000 Chinese-English sentence pairs for TRAINING.
2,000 sentence pairs for VALIDATION.
2,000 sentence pairs for TRAINING.
As before, we used Google’s AutoML to train the custom model.
The Results: The Chilin Custom MT showed a BLEU score improvement by 3.6 points.
The score was lower than for in the English to Chinese test, but the custom engine scored higher than the base Google NMT model.
The rate control membrane can be made of permeable, semi-permeable or microporous materials, which are materials known in the art that can control the rate of drug entering and exiting the delivery device, or materials disclosed in the patents cited above for reference.
The rate-controlling membrane may be made of permeable, semi-permeable, or microporous materials that are known in the art to control the rate of drug ingress and egress from the delivery device, or materials disclosed in the patents previously incorporated by reference.
C. Human Translated English
The rate controlling membranes may be fabricated from permeable, semi-permeable, or microporous materials which are known in the art to control the rate of drugs into and out of delivery devices or are disclosed in the aforementioned patents previously incorporated herein by reference.
Chinese to English Example #3
A. Google Standard Machine Translation
B. Google MT augmented with Chilin data
In order to prepare mannitol crystals, the preparation is frozen in a thermal cycle (annealing), and then the product is dried once and twice.
In order to obtain mannitol crystals, the preparation was frozen in a thermal cycle (annealing), followed by primary and secondary product drying.
C. Human Translated English
To achieve crystallization of the mannitol, the formulations were frozen with thermal cycling (annealing), followed by primary and secondary drying of the cakes.
While similar, we have found that the enhanced model produces output that is more accurate and more readable, as one would expect with a 3+ point BLEU score improvement.
In the above three examples, which of the A, B, and C translations is the best? Is human translation always the best, and error-free? We welcome your views!