[Beowulf] High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
hanzl at noel.feld.cvut.cz hanzl at noel.feld.cvut.czWed Oct 27 12:52:06 PDT 2004
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> I'm also very interested in just what sort of symbolic manipulation you > are working on. My numerical/symbolic mix underlying my opinions is from natural language processing, mostly speech recognition. Involves training phase which uses huge amount of recorded speech which is iteratively turned into estimated statistical distributions of phoneme sounds (multivariate gaussians with some 500.000 parameters, work for the FPU) and huge amount of text turned into dictionaries and grammar rules (symbolic and maybe even SQL). This phase is not very beowulfish, processes can work locally for minutes. Then there is the recognition phase when we match unknown utterances against our models of sounds and pronunciation and dictionaries and grammar and this is very beowulfish as we need to estimate zillions of partial hypothesis and compose them together somehow, likely in real time, and we are happy to pass quick messages around and keep most things in aggregated cluster RAM. Training on huge speech data has very much the pattern just described by Mark Hahn: > depends. for instance, it's not *that* uncommon to have DB's which > see almost nothing but read-only queries (and updates, if they happen > at all, can be batched during an off-time.) that makes a parallel > version quite easy (thought we do not have wav files in SQL :-) ) and we are much interested in ways to divide our data to chunks cached on local harddisks on nodes and repeatedly processed again and again (say 30 times during one itarative process, and we try many variants of this process on the same data.) Of course we have just one cluster for both things, so it constantly switches between being a beowulf and not being a beowulf :-) Best Regards Vaclav Hanzl
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
