Google Translate
Closed for Renovations

Did Google Get a Little Ahead of Itself?

We think so. In 2016, Google Translate switched from its statistical machine translation model, Phrase-based Machine Translation (PBMT), to the Google Neural Machine Translation model (GNMT). And although this greatly improved its overall translation quality, there is still significant progress to be made. So yes, Google did get a little ahead of itself, and here’s why.

From PBMT to GNMT

In the spring of 2006, Google introduced Google Translate, a completely free, multilingual machine translation service. In order to compile the necessary linguistic data, for the first 10 years of its existence, Google Translate relied heavily upon human-translated transcripts from reputable online sources such as the United Nations and the European Parliament. It would feed this data into a statistical machine translation model coined Phrase-based Machine Translation (PBMT). With PBMT, translations were performed word by word. And for simple sentence translations, this worked fairly well without too much of a negative impact on sentence context (meaning). However, PBMT didn’t work so well as sentences become more complex.

When a translation cannot maintain the overall context of a sentence or phrase, the translation begins to break down. In fact, translations can end up becoming completely incoherent. So, after 10 years of increasingly negative feedback tarnishing its reputation as a reputable machine translation service, Google introduced Google Neural Machine Translation (GNMT).

GNMT relies on a large neural network to enhance its translation performance. It essentially teaches itself how to translate between languages, building an artificial neural network with the assistance of deep learning. To put it into plain English, instead of translating word-for-word as it did with PBMT, GNMT uses these neural networks to translate full sentences without having to break them down into smaller chunks. This advancement helped Google Translate maintain the context of more complex sentences and, therefore, greatly strengthened the overall quality of translation. But it’s even more impressive than that. Instead of just using a single-model system with specialized computers translating from one language to another and back again, GNMT employs Zero-Shot Translation. A single computer can now learn to translate between multiple languages.

Essentially, as the computer is gathering linguistic patterns for one language pair, it is simultaneously learning to translate from one language to another and back again, using the same data it collects. Sounds great, right? And it is for languages that have an abundance of reference material. However, how does the translation quality hold up for low-resource languages that do not have enough accessible and reliable online human-translated text?

Google Translate likely applied the brakes to these low-resource languages, right? They likely slowed down a little, Wooh Nelly!, and are patiently waiting until they have enough human-translated texts in these additional languages to offer the same caliber of translation, right? Wrong. And this is our contention. Google, did you get a little too big for your britches? Again, we think so.

Google Translate Became Too Big for its Britches

Yes, it’s impressive that Google Translate claims to now service over 500 million users each month and that it translates roughly 140 billion words each day. But if even a fraction of these translations is of very low quality, that impressive reputation begins to suffer. Again.

Just because you can, doesn’t mean you should.

If you happen to be using Google Translate for any of the major European languages, such as English, French, Spanish, Italian, or Portuguese, chances are your experience will be fairly positive. This is because Google Translate relies on the millions and millions of already human-translated text that exist for these languages. However, if you happen to be using this translation software for low-resource languages, the opposite may be true. It stands to reason that if there isn’t enough high-quality online human-translated text available in a given language, Google Translate’s translation quality will undoubtedly suffer. Case in point – English to Kiswahili.

English to Kiswahili (High-resource to Low-resource Language)

When we asked our professional “human” English to Kiswahili translator to evaluate a number of Google Translate’s English to Kiswahili translations over the course of seven weeks, we unintentionally induced a lot of laughter. Out of the 490 translations that Google Translate performed, there was, on average, one sentence in every three that our translator rated as a “poor” translation and one in every five that completely lost its mark:

When we add these together, slightly more than 53 percent of the 490 sentences were inaccurate. And this is just from English to Kiswahili. What about the other low-resource languages for which Google Translate claims to offer translation services? Granted, not all of the sentences we used would be considered “basic,” and some might have even employed idiomatic expressions that are difficult to translate, but this only strengthens our position. Until there is enough linguistic data for Google Translate to access, perhaps they shouldn’t yet be adding these languages into their database.

There are approximately 6,000 languages spoken worldwide, but fewer than 100 have an abundance of online human-translated resources. About 3,000 of the world’s languages have published accessible data, descriptions, and/or dictionaries at varying levels of quality, leaving nearly 3,000 in the low-resource category as far as computational linguistics is concerned. But Google doesn’t seem to mind adding low-resource languages to their database regardless of the scarcity of online linguistic data. 

In fact, we’ve identified all 103 languages that Google Translate currently serves, and out of this number, we estimate that roughly 30 percent fall into the low-resource category. Google, shame on you!

Google Translate Community 

Google has tried to fill in the low-resource gap with its Google Translate Community. This community consists of volunteers who review and offer their own translations. The community is now made up of over 3 million volunteers and roughly 90 million translation contributions. Again, impressive numbers, but that’s not fooling us one bit.

For translations to be considered high quality, there must be at least some accountability measures. However, anyone from anywhere can join the Google Translate Community of volunteers and begin translating in about 0.3 seconds. There is no need to prove you are a native speaker, no need to prove you have the expertise or credentials to translate, and no need to prove that the translations you’re providing are accurate. Google, what are you thinking?

Google Translate – Proceed with Caution

Technology is increasingly becoming a reality within the language services industry, and it is only getting better. In a pinch, Google Translate is great. The meaning of common greetings and simple sentences will likely be clear enough, even if the translation isn’t perfect. Google Translate also provides pronunciation support and can easily be used as an online dictionary. But user beware.

As a high-quality machine translation software, it doesn’t pass our smell test and has no business at this stage of the game, offering translation services for low-resource languages. It’s doing a disservice to these languages, their speakers, and those eager to learn them. 

Ok, so we’re pretty sure you won’t close for renovations, Google Translate, but please consider employing some human professional translators in these low-resource languages to improve your overall translation quality. And by the way, the Kiswahili word, “marahaba”, doesn’t even come close to the translation you’re spitting out! Can you please clean this one up right away? We’ll be watching. 

 

Are you looking for English to Kiswahili “human” translators? Here is a list we are compiling just for you! When machine translation isn’t cutting it, trust the human professionals to get the job done right!

 

0
Would love your thoughts, please comment.x
()
x