Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

OT: computationally hard or not (algorithm question)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Robert G. Brown rgb at phy.duke.edu
Sat Mar 24 14:59:05 PST 2001


On Sat, 24 Mar 2001, Frank Joerdens wrote:

> That sounds very interesting indeed! To spill the beans about what I'm
> up to (it's a little embarassing, seeing that you supercomputing people
> are into fairly serious stuff mostly; protein sequences, physics, data
> mining etc.), it's an online community/game where I want to give users
> the option to find "similar" avatars to theirs. Similarity is a
> massively complex notion, psychologically, and ideally this would be a
> problem that you'd attack via some AI strategy (expert systems, or
> systems that "learn" similarity), but the process needs to be _fast_: If
> a user comes to the site, outfits her/his avatar with attributes and
> then hits the go button to look for potential buddies, something around
> a second would be acceptable. This is bearing in mind that there might
> be dozens of those queries running more or less simultaneously, or in
> quick succession. Hence my thinking that you'd define a set of
> properties according to which every attribute is rated. Consider
> clothing: Every item would get a rating according to coolness, elegance
> and sportiness (we played that through; this would be _way_ too narrow,
> you'd need a quite a few more differentiated categories), which means
> you're looking at a 3D space where each dimension might be only 3
> dimensions deep (2 = very, 1 = kind of, 0 = not at all). Actually, I am
> thinking now that I could do with a depth of only 3 or 4 but that I'd
> still need around a dozen dimensions (we haven't fixed the property set
> yet).
>
> Do you have a starting point from where to dive into this body of
> literature by any chance?

Similarity is my business.  Read about Parzen-Bayes classifiers -- this
isn't exactly what you want (because you don't have discrete
identifiable classes) but the notion of a classification/similarity
metric is developed there.  The next interesting idea to explore would
be using a neural network.  One can build a kind of network that
identifies a given individual out of a crowd, and then "score" the crowd
with the network to find near misses.  Expensive on a per-person basis,
but depending on how hard you want to work you might be able to
distribute the time.  This would also parallelize very nicely -- train
nets for each person in parallel on many hosts, coarsely aggregate them
by score, and perhaps identify a classification schema a posteriori.

The same kind of thing is VERY useful in e.g. Web Business (or any
business) identifying (potential) customers "like" your best customers,
for example.

> I don't read C too well but as far as I could make out, the program you
> enclosed (great!) implements Robert's first suggestion to
>
> -------------------------- begin quote --------------------------
> to pick an element (or a point), evaluate the distance of each element
> in the elements from the element or point (storing the result in e.g.
> distance[i]) and sort the indices of distance (not the list itself).
> -------------------------- end quote --------------------------
>
> I'll try that!

Read about Parzen-Bayes first.  This is still they way you'd want to
store the data; PB will just help you work out a way of creating a
"good" similarity metric.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list