Apps & Design
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jul 3 07:27:13 PDT 2000
- Previous message: Apps & Design
- Next message: Apps & Design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, 1 Jul 2000, Gregory R. Warnes wrote: > > Careful. You are equating "the problem consists of independent pieces" > with parallelizable. > > Stating that natural language processing is inherently non-parallel > ignores the fact that the only system known to correctly process natural > language (the brain) is made up of a lot of independent processing units! > > The current tools may not correctly approach the problem correctly, but > that does not mean it isn't possible. I'd agree with this. I'd also argue that translation CAN be largely parallelized -- it is just a question of granularity. Parallelizing word by word isn't too sensible, because as Alan noted there are moderately nonlocal referents and word order in sentences matters. Parallelizing by sentence (a coarser grainsize) would work much better -- sentences largely stand alone. However, to get nuances of meaning, it might be useful to have the preceding and following sentences. Parallelizing by paragraph or by "page" (say 4096 characters mod paragraph) is almost certainly adequate to get the internal sense of the paragraphs and sentences correct, but there might need to be some conflict/continuity resolution at page boundaries -- say have each node translate the preceding/following paragraphs and compare, and shift windows (or even auto-rescale the grain size) until things are "smooth". Even with quite a lot of rescaling of page size and internode comparison of boundary regions, the parallel speedup one could realize would be quite significant because I'd guess that translation is pretty system intensive work -- looking up base word translation(s) in a hash, determining case, tense, verb conjugation, identifying and translating idiomatic expressions as a unit, a lot of work -- and communications would be mostly if not entirely local, to the node ahead and the node behind (possibly with some data propagating all the way down the chain. If one assumes that one has 1.5 MB of text to translate (a decent sized novel) on 100 nodes, even if every node ended up translating 30K and transmitting 10K up and downhill to its neighbors, one could expect a speedup of maybe 30 to 50 and little loss of translation quality. If one wasn't translating a single work, but was instead translating e.g. all the works in the Library of Congress, the problem is obviously embarrassingly parallel -- one could select as a grainsize whole "independent" books and obtain excellent parallel speedup with almost any number of nodes. This wouldn't address Alan's basic observation, though, that machine translations of this sort (parallel or serial, really) lack global human context and hence will make heinous mistakes. It also oversimplifies the very severe problems associated with translating between really different languages that share no significant cultural referents in the first place, e.g. Chinese and English. I have no idea how well ANY translation program works for languages this different -- but I'd assert that whatever serial translation program one wrote could be cleverly parallelized to obtain parallel speedup at some suitable granularity, even if that granularity was at the level of independent works. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Apps & Design
- Next message: Apps & Design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
