I’ve been fascinated by Google Translate for a long time, having backgrounds, to varying degrees, in both statistical learning, languages, and translation. Of course, I’m broadly interested in the role of technology in shaping culture and communication across boundaries.
The singular achievement of Google Translate is that it has “industrialized” the translation business: it can translate a lot of stuff fast, at an “acceptable” quality, without even needing to understand what is being translated, as long as there is enough data. By its nature, it is extremely effective at translating “common usage,” without nuances, subtleties, ironies, and other “odd” linguistic features. After all, there is plenty of data for common usage, of everyday material, that happen to be common in every language out there. Florid, ironic, and other “weird” use of languages, on the other hand, are statistical blips that the algorithm cannot understand. It takes actual knowledge of the language(s) and its (their) peculiarities to make sense of them.
For the somewhat barbaric, “business” use of the language, this is more than fine. If you want to know the basic gist of something, the subtleties and nuances are a hindrance. You want to just get the point across and that is perfectly well suited for the workings of Google Translate. But the interesting cultural variations lie in how the language is used, what kind of nuances are invoked, and other subtleties. All these subtleties don’t get translated and, in absence of the context, the actual understanding across boundaries remain closed. In other words, Google Translate opens boundaries if all cultures, groups, and organizations think alike, except in mechanically different languages.
One of odder but, in retrospect, better, Star Trek Next Generation episodes dealt with this directly: Yes, we can translate exactly what the other guys said, but who the heck are Darmok and Jalad, and what the heck is Tanagra? Of course, the episode compounds errors by having Picard talk to the alien about Gilgamesh and Ur…as if, just because he is now making references to Earthly myths, the other side can somehow make sense of what he is talking about. In the end, the bottom line is, whether you speak of Ur, Gilgamesh, Zeus, or Shennong, you’d better know what the heck these references are and be aware that the other guy knows what they are–and what you know of them–before you can communicate using them as reference point.
This is not just a problem with languages, but all sorts of symbology and common references. In one of many subtle and brilliant obsevations he made, Tom Shelling pointed out that the centroid of a city would be a wonderful reference point for a group of mathematicians familiar with the geometric shape of the city, but totally meaningless for anyone else. Effective communication requires that 1) we know what references we are making; 2) they know what references we are making; 3) both we and they know what we know. The third point is often lacking, and leads to weirdness: we “know” that the price system is a better reflection of the market information than any single source and is, moreover, a cheaper source of information than actually learning about what’s going on that affects the market. The consequence is that the informativeness of the price system subverts itself, essentially as a prisoner’s dilemma–nobody learns enough to outperform the price system, until the price system is sufficiently messed up. Precisely because the price system is messed up, but also because traders are rewarded by how well they can play the price system, even the otherwise knowledgeable people have to play dumb to follow the markets–as long as they believe there are enough dumb people in the market, which is impossible to tell from smart people acting dumb as opposed to dumb people acting dumb. As a Japanese proverb supposedly says (I think), if all the foxes in a forest had their tails chopped off, it’s the fox with the long tail that is the strange one, and if there is a penalty for not fitting in, being like others trumps being right.
The conceit of the Google Translate, then, is to seal off our minds to those who think differently, because we cannot easily “measure” what makes them tick. They are Darmok and Jalad and Tanagra, and since they make no sense to us, even if we can translate them, they can’t possibly be relevant to us. We will stick to our Gilgamesh and Ur, and if they want to talk to us, they will have to learn about Gilgamesh and Ur–although they can talk about Gilgamesh and Ur in their own language, because we can translate that, as long as it’s about Gilgamesh and Ur that is.
In Orwell’s 1984, the totalitarian dystopia of Oceania is surprisingly multicultural. If Orwell had a glimpse to the future, perhaps he would not have spent so much time coming up with Newspeak in English, but a universal family of Newspeaks in different languages that happen to be structurally the same (or at least logically equivalent in a predictable fashion), with all the voices across Oceania speaking in unison via many languages but making the same point in exactly same structure, how they all love the Big Brother and hate Emmanuel Goldstein, and how they are so diverse and multifaceted because they all speak in many different languages, even if they are exactly the same (so that they are Google Translatable.) This is a sobering thought.