7. Improvement of English-to-Chinese MT of Pharmaceutical Texts
In our last blog, we discussed how Chilin offered English-Chinese parallel sentence pairs on the TAUS Data Marketplace. Our first offerings are related to the Pharmaceutical-Biotechnology domain.
Does this data actually improve machine translation? The answer is “yes” as we show below.
Chilin conducted a test using the parallel sentence data and the AutoML Translation tool on the Google Cloud Platform. The study was based on:
8,000 English – Chinese sentence pairs for TRAINING.
2,000 sentence pairs for VALIDATION.
2,000 sentence pairs for TRAINING.
Once the data is selected and loaded, AutoML makes it relatively simple to train a custom machine translation model. This simplicity is by design, as described in this interview with Francesco Bombassei of Google on Slator.com - Inside Google’s Custom Neural Machine Translation—AutoML Translate. “Under the hood, he explained that AutoML works with transfer learning and neural architecture search. Transfer learning is a way to use machine learning models as a basis for training others.” Domain specific training data is used to improve the base Google NMT system for that specific domain.
The Results: The Chilin Custom MT showed a BLEU score improvement of almost 5 points.
The testing was performed on patent based parallel sentences outside of the Training, Validation and Test sets. Here are some sample results:
English to Chinese Example #1
A. Google Standard Machine Translation
B. Google MT augmented with Chilin data
MMPs are known to be synthesized as latent precursor enzymes that can be activated by limited proteolysis, but the exact mechanism by which this activation takes place in vivo is largely unknown.
There was no significant difference in the frequency of apoptosis in tumor cellsin the treated xenografts, and no clear effect on angiogenesis as measured by microvascular density (MVD) via immunohistochemical staining for the endothelial cell marker, CD31.