The History of Computer Language Translation

  January 30, 2014

We’re starting to take for granted the number of tools that translate from one human language to another – not least of which are the translations included in Web browsers – even when they do an imperfect job. The technical road to relative success in this realm has been a rocky one, however.

Expressing one's self in another language has never been thought of as simple. Each of us who hoped in secondary school that they could get away with one-word-at-a-time lookup knows how hopeless these attempts are.

Human errors in translation can be and have been cataclysmic. In July 1945, during World War 2, the United States issued the Potsdam Declaration, demanding the surrender of Japan. Japanese Premier Kantaro Suzuki called a news conference and issued a statement that was supposed to be interpreted as, “No comment. We’re still thinking about it.” That wasn't what got to Harry Truman. Suzuki used the word “mokusatsu.” The problem is, “mokusatsu” can also mean “We’re ignoring it in contempt.” Less than two weeks later, the first atomic bomb was dropped.

The desire for technology to solve the language problem goes back quite a ways. The earliest mention of a “universal translator” is in Murray Leinster's short story, “First Contact” (Astounding Science Fiction, May 1945). And we all know that at the time of the launch of the Enterprise NX-01 in 2151 (Star Trek’s “Broken Bow,” first aired 25 September 2001) the Universal Translator was still “experimental.” (There are plenty more examples of science fiction with language or linguistics as a plot device, in case this topic appeals to you. I'd suggest H. Beam Piper’s, “Omnilingual” and Jack Vance’s The Languages of Pao.)

But in what I laughingly refer to as the “real world,” 2151 is still 137 years off. Although people have been working on machine language translation since the 1950s, we're not quite there yet.

Maschin Uebersetzung: upsetting the machine

Work on machine translation began in the 1950s, with research at both MIT and Georgetown initiated in 1951. There was work in both Japan and the USSR, too; and an international conference was held in London in 1956.

In The New York Times for August 16, 1964, Isaac Asimov noted, “The I.B.M. exhibit at the present fair [New York World's Fair] has no robots but it is dedicated to computers, which are shown in all their amazing complexity, notably in the task of translating Russian into English.” That year the National Academy of Sciences set up an Automatic Language Processing Advisory Committee (ALPAC). Two years later it reported negatively on a decade's research. Funding was reduced.

The linguistics problem is easy to state: Humans appear to hear or read the source text, decode its meaning, recode the meaning into that target language, and write it out or speak it. But how can we program a computer to "understand" the way a person does, and to "create" a new text in the target language that "sounds" as though it was written by a person?

Answer: Set up linguistic rules for the machine to employ.

When I was at IBM Research in the mid-1980s, I spent some time with a Natural Language group. A problem I encountered was that the computer insisted that certain (grammatical) language structures were ungrammatical.

For example, in English, the third person of a verb normally ends in -s (he cooks, she drives, it burns). However, when a sentence is embedded or subordinated, this isn't true (I couldn't get them to send the book out of the country, so I wrote Stuart and asked that he buy me a copy.). No way. The machine wanted buys.

I asked several people at Harvard, where there was a child language group, at what age most children achieved the grammatical distinction. They agreed that the form was recognized at five or six and produced by eight. My IBM 370 mainframe computer was not yet at a six-year-old’s language level.

But let's be more practical and more realistic.

With funding reduced in 1966, the Georgetown project gave rise to SYSTRAN, founded by Dr. Peter Toma in San Diego in 1968. Large numbers of Russian scientific and technical documents were translated using SYSTRAN under the auspices of the USAF Foreign Technology Division. While the quality of the translations was only approximate, because of the limited context it was usually adequate for understanding content.

Proverbs and adages have been notorious examples of poor translation: “Out of sight, out of mind” yielding “Invisible lunatic” on more than one occasion; as well as "The liquor is strong, but the meat is weak" for “The spirit is strong, but the flesh is weak.”

SYSTRAN (now a publicly-traded company on the Paris exchange) is the basis for Babel Fish and the OS X translation widget. Up to a few years ago, it was used by Google. SYSTRAN is commercially available on Microsoft Windows, Linux, and Solaris. Historically, SYSTRAN employed rule-based translation, employing linguistic information from dictionaries and grammars; but the 2010 “Server 7” uses a statistical method in which translations are generated on the basis of statistical models, the parameters of which are derived from analyses of bilingual texts.

The 1980s saw projects involving the translation of business correspondence, legal materials, and technical materials  (invariably easier than personal documents or literature) by both IBM and BBN.

In 1995, Oscar Jofre founded The Babel Fish Corporation. This should not be confused with the “Babelfish Altavista translation service,” established by DEC and SYSTRAN in 1997. In 2003, Altavista was bought up by Overture, which, in turn, was taken over by Yahoo! In 2012, the Babelfish translator was replaced by Microsoft's Bing translator. (I don't want to get into a war, but in testing French and Spanish to English, several sites score Google higher than Bing.)

Again, while this is effective for regular texts, where many phrases are reused, it cannot represent original work.

Last December, an acquaintance wrote me a tale of woe. I responded appropriately, ending with, “Permitte divis cetera.” “Give the rest to the gods?” he responded. I replied, “No. 'Leave everything else to the gods.' It's from Horace, Odes, I.ix.9. Horace wrote: When it is bitterly cold outside, pile up logs on the fire and drink vintage wine. Leave everything else to the gods.” My friend wrote back: “So much for relying on Google translate.”

Google actually does fairly well. For Goethe's “Wider den Tod ...” epigram, it yields, “Against the death no herb is grown.” But it's a lot better than the gee-if-it-only-were-English, “The wider toad....”

Humans are still better than machines at understanding and comprehending. Translation is a difficult process. It is far more than word substitution.

A well-known geek pointed out to me that “The sine qua non of 'machine translation' has to be C3PO. He speaks several thousand languages including 'the binary language of moisture 'vaporators'.”

Dylan Thomas' “The force that through the green fuse drives the flower drives my green age” is translated into French (by Google translate) as “La force fait par les lecteurs de fusibles vert de la fleur de mon âge Durs vert.» And if I re-translate this, I get: «Of force by the green fuse drives the flower Drives my green age.”

We're not there yet.

See also: