Closed for Renovations
Did Google Get a Little Ahead of Itself?
We think so. In 2016, Google Translate switched from its statistical machine translation model, Phrase-based Machine Translation (PBMT) to the Google Neural Machine Translation model (GNMT). And although this greatly improved the overall translation quality that Google Translate is now able to perform, there is still significant progress to be made – so yes, Google did get a little ahead of itself and here’s why.
From PBMT to GNMT
In the spring of 2006, Google introduced Google Translate, a completely free, multilingual machine translation service. In order to compile the necessary linguistic data, for the first 10 years of its existence Google Translate relied heavily upon human-translated transcripts from reputable online sources such as the United Nations and the European Parliament. It would feed this data into a statistical machine translation model it coined as Phrase-based Machine Translation (PBMT). With PBMT, translations were performed word by word. And for simple sentence translations, this worked fairly well without too much of a negative impact on sentence context (meaning). However, PBMT didn’t work so well as sentences became more complex.
When a translation is unable to maintain the overall context of a sentence or phrase, the translation begins to break down. In fact, translations can end up becoming completely incoherent. So, after 10 years of increasingly negative feedback tarnishing its reputation as a reputable machine translation service, Google introduced Google Neural Machine Translation (GNMT).
GNMT relies on a large neural network to enhance its translation performance. It essentially teaches itself how to translate between languages, building an artificial neural network with the assistance of deep learning. To put it into plain English, instead of translating word-for-word as it did with PBMT, GNMT uses these neural networks to translate full sentences without having to break them down into smaller chunks. This advancement helped Google Translate to better maintain the context of more complex sentences and therefore greatly strengthened the overall translation quality. But it’s even more impressive than that. Instead of just using a single-model system with specialized computers translating from one language to another and back again, GNMT employs Zero-Shot Translation. A single computer can now learn to translate between multiple languages:
Image source: Google AI Blog
Essentially, as the computer is gathering linguistic patterns for one language pair (in this case, English to Korean and Korean to English), it is also simultaneously learning how to translate from Korean to Japanese, and Japanese to Korean, using the same data it is collecting. Sounds great, right? And it is for languages that have an abundance of reference material. However, for low-resource languages that do not have enough accessible and reliable online human-translated text, how does the translation quality hold up?
For these low-resource languages, Google Translate likely put the brakes on, right? They likely slowed down a little, Wooh Nelly!, and are patiently waiting until they have enough human-translated texts in these additional languages to offer the same caliber of translation, right? Wrong. And this is our contention. Google, did you get a little too big for your britches? Again, we think so.
Google Translate Became Too Big for its Britches
Yes, it’s impressive that Google Translate claims to now service over 500 million users each month, and that it translates roughly 140 billion words each day. But if even a fraction of these translations are of very low quality, that impressive reputation begins to suffer. Again.
Just because you can, doesn’t mean you should.
If you happen to be using Google Translate for any of the major European languages, such as English, French, Spanish, Italian, or Portuguese, chances are your experience will be fairly positive. This is because Google Translate relies on the millions and millions of already human-translated text that exist for these languages. However, if you happen to be using this translation software for low-resource languages, the opposite may be true. It stands to reason that if there isn’t enough high-quality online human-translated text available in a given language, Google Translate’s translation quality will undoubtedly suffer. Case in point – English to Kiswahili.
English to Kiswahili (High-resource to Low-resource Language)
When we asked our professional “human” English to Kiswahili translator to evaluate a number of Google Translate’s English to Kiswahili translations over the course of seven weeks, we unintentionally induced a lot of laughter. Out of the 490 translations that Google Translate performed, there was on average, one sentence in every three that he rated as a “poor” translation, and one in every five that completely lost its mark:Download the Data Download the Data
When we add these together, slightly more than 53 percent of the 490 sentences were inaccurate. And this is just from English to Kiswahili. What about all of the other low-resource languages for which Google Translate claims to offer translation services? Granted, not all of the sentences we used would be considered “basic” and some might have even employed idiomatic expressions that are difficult to translate, but this only strengthens our position. Until there is enough linguistic data for Google Translate to access, perhaps they shouldn’t yet be adding these languages into their database.
There are approximately 6,000 languages spoken worldwide but fewer than 100 have an abundance of online human-translated resources. About 3,000 of the world’s languages have published and accessible data, descriptions, and/or dictionaries at varying levels of quality which leaves nearly 3,000 in the low-resource category as far as computational linguistics is concerned. But Google doesn’t seem to mind adding low-resource languages to their database regardless of the scarcity of online linguistic data.
In fact, we’ve identified all 103 languages that Google Translate currently serves and out of this number, we estimate that roughly 30 percent fall into the low-resource category. Google, shame on you!
Google Translate Community
Google has tried to fill in the low-resource gap with its Google Translate Community. Volunteers review translations and offer translations of their own. The community is now made up of well over 3 million volunteers amounting to roughly 90 million translation contributions. Again, impressive numbers, but that’s not fooling us one bit.
For translations to be considered high quality, there has to be at least some measure of accountability. However, anyone, from anywhere, can join the Google Translate Community of volunteers and begin translating in about 0.3 seconds. There is no need to prove you are a native speaker, no need to prove you have the expertise or credentials to translate, and no need to prove that the translations you’re providing are accurate. Google, what are you thinking?
Google Translate – Proceed with Caution
Technology is increasingly becoming a reality within the language services industry, and it is only getting better. In a pinch, Google Translate is great. For common greetings and simple sentences, even if the translation isn’t perfect, the meaning will likely be clear enough. Google Translate also provides pronunciation support, and can easily be used as an online dictionary. But user beware.
As a high-quality machine translation software, it doesn’t pass our smell test, and it has no business at this stage of the game, offering translation services for low-resource languages. It’s doing a disservice to these languages, their speakers, and those eager to learn them.
Ok, so we’re pretty sure you won’t close for renovations, Google Translate, but please consider employing some human professional translators in these low-resource languages to improve your overall translation quality. And by the way, the Kiswahili word, “marahaba”, doesn’t even come close to the translation you’re spitting out! Can you please clean this one up right away? We’ll be watching.
Looking for English to Kiswahili “human” translators? Here is a list we are compiling just for you! When machine translation just isn’t cutting it, trust the human professionals to get the job done right!