Parallelizing Translation

Robert G. Brown rgb at phy.duke.edu
Wed Jul 5 14:38:08 PDT 2000


On Wed, 5 Jul 2000, Nathan L. Cutler wrote:

> But computers by their very nature do not "understand".  They merely carry
> out instructions.
> 
> I don't think translation software can be useful as a replacement for human
> translators in most real-world cases in which human translators are
> commonly used.  Once the frequency of mistakes goes too high, the
> "translation" has to be thrown out and re-done completely from scratch
> because it's simply too human-labor-intensive to root out and correct all
> the mistakes.

Agreed, although you aren't fully conveying the depth of the problem you
allude to.  Parts of it go well beyond mere "understanding". (I have a
tiny bit of very dated expertise in this, because my informal focus in
my undergrad philosophy major was the philosophy of language and how
language is related to thought.  I also write poetry and study
information theory, which provides yet another source of insight.:-)

The real problem is that languages aren't isomorphic.  Period.  Also,
language use is imperfect at best even by native speakers.  Thus even
for human translators (possibly hearing or reading an original document
in their OWN native language), translation (or understanding)
ambiguities can exist which simply cannot be fully resolved.  This was
alluded to in several earlier posts, but it is important to recognize
that it is a problem that transcends whether the translator is done via
humans or machines.  To me a yam is just a yam, but to Trobriand
Islanders there are baby yams, adult yams, yams at a variety of stages
in between, yams you owe to others and yams you are owed, cooked yams
and rotten yams...  all with their own word and cultural implications
(IIRC -- the aforementioned classes were a quarter-century ago so bear
with me if I'm getting the island wrong or something).

So when I write "He gave me some yams." in English or in Trobriand and
try to translate it to the other, the translation will almost certainly
pick up some extra freight in one direction or lose a considerable
amount in the other, even if the full context of the statment is known.
Did he give me the yams because he owed them to me?  Were they ripe? In
one language this wouldn't be ambiguous or even context dependent while
in the other it's unlikely that the sentence's author even knew that the
yams in question were ripe-yams-given-freely or whatever, and there may
have been no NEED for a context-based communication because the sentence
said it all.  Then there are puns, which often simply don't have a
meaningful translation at all.  How do you translate "Mmm-mmm good --
those yams are yammy!" into Trobriand?  Even without the accompanying
groan?

This problem can persist even within a single language.  When I use
words like "isomorphic" in a sentence as I did above, a mathematician
will understand it one (fairly technical) way if the context supports
it:

*** Source: The Free On-line Dictionary of Computing (15Feb98) ***
isomorphic
 
   <mathematics> Two mathematical objects are isomorphic if they
   have the same structure, i.e. if there is an {isomorphism}
   between them.  For every component of one there is a
   corresponding component of the other.

This is the sense I intended, although I had to add quite a lot of
context to ensure communcation.  This is because, a biologist or less
informed person might well read the sentence that includes the word
isomorphic and understand it a different way:

*** Source: Webster's Revised Unabridged Dictionary (1913) ***
Isomorphic \I`so*mor"phic\, a. (Biol.)
   Alike in form; exhibiting isomorphism.

*** Source: WordNet (r) 1.6 ***
isomorphic
     adj : (biology) having similar appearance but genetically
           different [syn: {isomorphous}]

These are NOT the same thing.  A biologist might well miss the technical
mathematical information being communicated.  A "person off the street"
with a knowledge of the root words might conclude that I was just saying
that different languages "don't have the same shape" (which he already
knew, so what's the big deal -- you might still be able to say
"everything" in both of them, just different ways).  If I were a
biologist instead of a theoretical physicist, I might have MEANT the
biological variant, in which case a reader should be wondering whether I
mean that languages have dissimilar appearances (which is obvious) but
are "genetically the SAME".  God knows what my kids would make of the
word isomorphic.  Human communication is fairly imperfect even within a
language.

So I agree that it is very difficult to build a translator that goes
from Language_A->Language_B->Language_A that recovers even approximately
and vaguely the same document, but many of the problems in doing so
exist even if human translators are used.  If two different persons
perform the first and second steps the resulting document will almost
certainly differ "significantly" from the original, with a deviation
that depends on how much "space" lives in the non-mapped or multiply
mapped sets (how similar the languages and their associated cultural
mileau are).  Whole chunks of what is being communicated will disappear
irreversibly with each conversion, and in many cases new chunks that
weren't in the original will appear.  People think >>different
thoughts<< in different languages (and sentences can even mean different
things or multiple things all at once to two people who speak the same
language) and the >>best<< translation is just an approximation.

Translations of poetry, especially, illustrate this very nicely.
Multivalency, metaphors and allusions disappear wholesale, and in any
event aren't there in the "document" to be recovered under any
circumstances by a "translation program"-- one has to be fully literate
and in tune with the entire cultural context in order to understand a
complex poem in your >>own<< primary language(s), and even then one is
likely to miss some of the poet's intent or find something they didn't
intend.  Without understanding the poem >>fully<< (something only the
author may have ever done) how can it be translated?

This is a small part of what Walter and Alan were referring to.  There
are a number of mathematical elements to language translation.  There
are information-theoretic and filter-theoretic aspects -- when you've
transformed a color picture into black and white the color is simply
gone.  Maybe all the grass in the original was blue instead of green.
Who could tell from the B&W photo?  I cannot even tell if the color you
see as "green" is what I see as green -- in the case of my colorblind
son I'm fairly certain it is not.  Colorized B&W movies simply represent
an artist's best guess -- maybe that character's tie was yellow and
maybe it was pink, so to speak.  Maybe it even matters (enter whole
discussion of pink polka-dot ties on men in western culture and what we
should read into the character if the tie he wears is in fact pink).
There is a whole mathematics of information theory (built from Shannon's
theorem) that is intimately tied to physics and irreversible processes
as well as to computation and information management.

However, much of this information-theoretic aspect isn't really relevant
to the issue of parallelizing a software language translator >>as
opposed to<< a serial (or human) language translator.  Both of these
(generally including the best of human translators) will do an
infinitely poor job mapping concepts that don't translate at all and a
poor job at mapping concepts that don't map well.  The issues relevant
to parallelization vs serialization are primarily ones of locality --
including "cultural" locality or "understanding" and just how much
excess verbiage is required to make a reasonably unambiguous (and
technically correct) translation where such a thing is possible.

Here there are some obvious extremes.  Poems or songs are generally
>>very<< meaning-nonlocal, at the granularity of a whole literary
movement or tradition or beyond (to include all of human history and
civilization and a full knowledge of all the sciences and mathematics).
For example, one cannot translate a lot of western poetry without a full
knowledge of e.g. Greek and Roman civilization AND a full knowledge of
the nearest equivalent metaphorical objects in the target
culture/language and even then a lot will be lost or altered or will
presume that a Chinese reader knows the works of William Shakespeare.
Nobody sane would read a machine translation (parallelized or not) of
Dante's Inferno or the Mahabaharata or "El Jardin de los Senderos que se
Bifurcan".  Even this very non-poetic letter requires that the reader
know that I'm a physicist and use isomorphic in the mathematical sense
to correctly translate the word into a language where the two meanings
might be different words or even phrases altogether.  On the other hand,
translating a document that says that "In Joseph's province, three
thousand bushels of barley were harvested and their taxes paid."  in a
public record probably won't lose a lot even if a translation of the
translation ends up "Bushels of barley numbering three thousand were
harvested in Joseph's province.  Their taxes where paid." and a large
number of public records like this could likely be "safely" translated
in parallel by software.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list