Advice for 2nd cluster installation

Thu Jan 9 18:33:38 PST 2003

Dear experts,

We have installed a 32 nodes dual 1 GHz P3 clusters a year already.  Its
performance is excellent, and the system stability is fairly OK.

Due to the increase in system loading, we are going to install
additional nodes in the coming half year.  I would like to have more
hints from our experts on the following questions:

(1) Currently most of the major vendors already propose P4 nodes of
    at least 2.2 GHz.  In view of the difference in processor speed,
    definitely the new nodes have to be separated from the old nodes.
    We can either create new batch queues to accommodate these new nodes
    (but still sharing the old file system), or build an entire
    new cluster with its own front-node and file system.
    What will be the relative advantages/disadvantages?

(2) If we real intend to build a completely new cluster to house all the
    new nodes, i.e. with its own front-node and file system, is it
    possible for us to build some backup resilience between the 2
    clusters as well?

    During the last year, we experienced around 8 times of failure
    to the processor nodes.  All of them are related to the failure of
    fan in the power supply. Losing 1-2 nodes for around 4 hours never
    affect the overall operation of the cluster.

    However on one occasion the master RAID system failed.  All users
    were not allowed to login for almost 12 hours, as the "/home" was
    totally unavailable during this period of time.

    One possible way can be a SAN approach which the two file systems
    are always mirrored.  Will it be very expensive?

    Another way is just a cross-mounting of two file servers.
    Likely the postgrads will be on the old server while
    the researchers will be on the new one.  During normal operation,
    each cluster is only going to use its local file system, but the two
    servers will be "rsych" during the night time.

    In case a file system is inaccessible,  all users will be allowed
    to access the remaining available file system (after the sysadm. has
    done some work).

    This sounds complicated, but should be much cheaper.  Any expert
    has such experience?

(3) Most of the major vendors proposed blade server approach as alternate
    proposal to the conventional 1U server.  By stuffing 14
    processors board into a blade centre (actually just another type of
    rack-mounted chassis occupying 7U), the "processor density" can be
    double.

    However when I asked them the same question as below, they cannot
    give me a definite answer (or at least I am not convinced by their
    answer).

    The question is: will there be a timing difference in case a processor
    in the 3rd blade, insider the 3rd chassis, is trying to communicate
    (through MPI) to a processor in another blade within the SAME chassis,
    as compared to another processor in a blade housed within ANOTHER
    chassis.

    A sales from a vendor answered me that there should be some difference,
    as the communication within a blader centre will go through the
    back-plane.  Once it goes out from the blade centre, the communication
    has to go through an inter-chassis switch, thereby should have some
    timing difference.  He further told me that it is the beauty of
    the "infiniteband" which I don't have any experience.

    However another sales answered me that they should be the same,
    because all processors within the entire "rack" should have distinct
    IP addresses, and the communication between any 2 processors should
    be fair and equal.

    Which is right?

THANKS for all expert advice.

W.K. Kwan
Computer Centre
University of Hong Kong