On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff
LATELY, RESEARCHING one thing and another, I’ve been enjoying Google Translate. This online tool supports 90 languages as either input source or translated target. Google Translate says that during 2013 it served 200 million people daily.
I’ve used Google Translate with my meager français, even less secure 日本語 and italiano, and utter paucity of Lietuvos, русский and ייִדיש שפּראַך. The slickness of Google Translate in these applications got me interested in how it works. And this led me to several tidbits.
Intriguing Google Translate lore is contained in “The Robots Are Coming,” by John Lanchester, in the March 5, 2015 issue of London Review of Books. As Lanchester describes, Google Translate “hoovered up [Brit for vacuumed] gigantic quantities of parallel texts into its database.” It is, in a sense, a mountain of Rosetta Stones, a means of translating one language to another by having recognized renderings in both.
For example, the United Nations has six official languages (Arabic, Chinese, English, French, Russian and Spanish). The European Union has a rule that all member languages have parity. In general, official documents of each are translated by conventional means into all the necessary languages. From these databases, Google Translate’s proprietary software uses statistical means to choose which target rendering best matches a source statement.
This approach is known as SMT, Statistical Machine Translation, as opposed to earlier rule-based counterparts (which are still in use). Rule-based systems stress linguistics, identifying object/verb/subject patterns, for example, and using these to translate from source to target.
SMT profits from a mass of languages already in machine-readable format, what’s called a “corpus” in the trade. Unlike rule-based software, SMT software needn’t be designed for any particular pair of languages.
There are still quirks, however. For instance, with some language pairs, it’s more efficient for Google Translate to function through an intermediate one, often using English as a bridge between source and target. Also, some languages use other bridges to English. For instance, a Google Translate rendering of Catalan to English, or vice versa, uses Spanish in between. Haitian Creole uses French; Ukrainian uses Russian.
In any translation, a phrase-by-phrase strategy can be misleading. As an example, a British football story citing “The coach of Manchester United states that it should be a hot match” offers two translational red herrings.
A word-by-word strategy has its pitfalls as well. There’s the (likely apocryphal) story of an American Embassy official in Moscow who received a message sent from the U.S. in English, then translated into Russian by the host country, then rendered back into English at the embassy. It read, alarmingly, “Your son hanged for juvenile crimes.”
The original, detailing college highjinks back in the U.S., read, “Your son suspended for minor offenses.”
Even with correct rendering, there’s the ambiguity of languages. Wife Dottie and I have a personal one-liner, looking intently at the other and saying, “I don’t deserve you.”
Now what’s that supposed to mean?
Gender complications arise in translation too. Contrasted with many languages, English is relatively free of gender-specific grammar. This can be to its detriment.
Mozart’s opera Cosi fan tutte, for example, is literally “Thus do they all.” However, the Italian tutte is the feminine plural, and a more accurate rendering is “Thus do all women.”
In fact, the opera’s theme concerns two guys wagering whether their respective girlfriends are true to them. (Spoiler alert: They aren’t completely, but it ends happily.) A spirited rendering of the title might be All Gals Are Like That.
Google Translate ran into a similar problem with gender-sensitive languages. For instance, assigning gender in translations of “I drive” and “I cook” from English to Hebrew, the software arbitrarily chose the masculine verb for the former rendering and feminine for the latter. After objections, it sidestepped the issue by selecting the masculine verb form throughout. (“I dance a plié in my tutu” must look odd indeed.)
As a final tidbit, German computer scientist Franz Och was Distinguished Research Scientist at Google and the chief architect of Google Translate. He’s now Chief Data Scientist at Human Longevity, Inc, in San Diego, California. In addition to his native German, Och speaks English and some Italian. And a whole lot of computerese. ds
© Dennis Simanaitis, SimanatisSays.com, 2015