[Beowulf] clusters in gaming

Thu Feb 1 13:13:48 PST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Eugen Leitl wrote:
> On Fri, Feb 02, 2007 at 01:42:53AM +0800, Dean Michael Berris wrote:
> 
>> I don't see why a usual C/C++ MPI approach wouldn't work, though the
> 
> In theory, there is no reason why these (or even Fortran) wouldn't be adequate either,
> but in practice it would be very difficult to accomodate user-contributed
> scripted objects into a rigid array/pointer framework. Adding new
> methods in C to a brand new object instantiated at runtime is certainly
> possible, but it sounds intensely painful. 
> 

Actually I was thinking more of having just primitive operations being
implemented as either free functions or functors in C++, and having a
chaining approach to making more complex functors. The idea is that once
complex operations can be "generated" and "(de)serialized", new stuff is
apparently just a combination of old/primitive stuff.

>>From the point of view of running a massively parallel realtime
> (fake) physics simulation with many 10 k simultaneous viewers/
> points of input it looks as if requires a massive numerical
> performance, which suggests C (less C++). Common Lisp now has
> very good compilers, but I wonder how well that translates
> into numerics, and similiar to C++ the unwary programmer can
> produce very slow code (CONSing, GC, etc).
> 

With the advances in C++ optimizing compilers and using modern C++
programming approaches (template metaprogramming, policy-driven
programming, lazy-functional programming, etc.) there's a very good
chance that a lot of the "slow code" can be avoided.

But of course, there has to be a conscious effort to
profile->benchmark->optimize C++ code which can only be done if you had
1) time and 2) resources at hand. But seeing how much money's being put
into SL right now, I think it's just a matter of time before the
resources will be available. :)

>> scaling issues of adding a new node to the cluster is certainly a
> 
> There are two types of regions, isolated islands, and addition to the
> main "continent". Both look quite suitable for geometric problem tesselation
> (one node, one region) and incremental node addition as the terrain
> grows.
> 

Sounds simple, but now that leads to non-optimal resource allocation. If
it was made that one node was allocated to one island, then you run into
scaling problems when you have very high traffic regions. That's why an
architectural solution should be found, because mapping regions to nodes
 1-1 doesn't seem to work: because if you have 1000 regions 1:1 to nodes
and 20k people in one region, what are the 999 nodes going to do?

>> problem that may be a hindrance from the implementation -- but one that
>> can be remedied by having local clusters "gridded" together using some
>> protocol.
> 
> As far as I know SL is run on one local cluster, which is why I thought
> of how a Beowulf approach would help. They're complicating it by using
> virtual machines, and packing several VMs on one physical server.
> After (frequent) upgrades servers are restarted in a rolling fashion,
> and I presume snapshot/resume migration is a useful function here.
> But then, there are cluster-wide process migration available,
> which are a lot more fine-grained.
>  

I don't have this information available, though it would be interesting
to note how this would really work. As early as now, they're
encountering scalability problems having hundreds of people packed into
a region. Apparently it does work, because people can still (somehow)
bear with the performance degradation in these areas.

>> As for throwing hardware at it, I don't think that's a problem -- that's
>> actually a good solution. That being said, if the implementation was
> 
> I thought the cluster had some 1000 nodes, but 
> http://gwynethllewelyn.net/article119visual1layout1.html
> claims there are just 5000 virtual servers. Maybe they
> just run 5 VServers/node, and there are really 1 kNodes,
> which is a reasonably large cluster for just 16 kUsers
> at peak (not for your garden-variety Beowulf, but
> for a game server).
> 

But the problem is, the physics in areas where there are a lot of
objects is still performed all in the cluster. So adding more people and
more objects will overload the physics engine on their end, and at
16kUsers at peak, can definitely overload certain nodes allocated for
certain regions.

But then I don't have any idea how they have it coded or implemented, so
I can only speculate.

>> already good to start with then adding more hardware would have
>> (supposedly) better effect on the overall performance/experience.
> 
> It would be really interesting to learn how current SL scales.
>  

I'll look forward to reading something about that.

>> I think it's an architecture problem more than anything as far as the SL
>> server side is concerned. But then when you're faced with a problem like
>> full-3D physics engine in the server side, that's not something "as easy
>> as Hello, World" to implement (or fix for the matter).
> 
> OpenCroquet uses a deterministic computation model, which replicates
> worlds to the end unser nodes a la P2P, and synchronizes differing inputs 
> so that each simulation instance doesn't diverge. It can also do a master/slave
> type of state replication, if I understand it correctly, so in
> theory it could use physics accelerators, and clone state to slower
> nodes. SL in comparison does about anything but primitive rendering
> cluster-side. Given current assymetric broadband, this seems
> to be a superior approach to do everything P2P. (And I would imagine
> OpenCroquet hasn't even begun to deal with the nasty problem of NAT
> penetration).
>  

Doing everything server side is a good idea, especially for giving
better client-side experience IF the server can handle it. Apparently,
SL on the server side is hitting the limits that their architecture have
either explicitly or implicitly defined.

Sounds still like the architecture might need more help to improve
current performance.

>> Though it certainly is not "impossible", "hard" would be an
>> understatement especially now that it's in full-deployment with
>> thousands of people getting on it at any given time.
> 
> It's really interesting. I wish there was more information flow out
> of Linden Labs, on how they're doing it.
> 

I wish the same too... They've open sourced the viewer, I just hope they
open source the server too.

- --
Dean Michael C. Berris
http://cplusplus-soup.blogspot.com/
mikhailberis AT gmail DOT com
+63 928 7291459
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFwlgMBGP1EujvLlsRAj6ZAKCzSSXKrGU2RaKeTDhB/Tf3vgLKfwCfWszt
nrL+cl7CvnRMaSm2QWQg6Tk=
=owi6
-----END PGP SIGNATURE-----