[Beowulf] Question on hgh performance, low cost Fileserver
mwill at penguincomputing.com
Mon Nov 21 17:12:23 PST 2005
I have not completely though this out yet, but what about something like
Have 1U or 2U servers with internal drives split into two (software)
Connect them with 1 ethernet cable to a switch fabric to serve half of
Connect them in pairs with the second gigabit ethernet cable directly
same cable as non-crossover since it is gigabit, most mainboards have
two nics down
use network-block-device to mirror one of the two groups to the other
If one node goes down, or a drive in a node goes down, you can take it
the data being offline since it is mirrored onto a second system. This
means you can save
the money for a raid controller and do software raid, but will spend
more money on
extra drives because of the raid1 across two machines.
If you use heartbeat and service failover, you might even be able to
service takeover between the two machines.
On top of that you can now run PVFS to aggragate the distributed storage
into a single
image without loosing your data if a node goes down for good.
Paulo Afonso Lopes wrote:
> GFS and GPFS are SAN-based. I do not have any experience with Lustre, but
> it seems (at least in a supported - by the vendor - configuration) to be
> based on a "back-end" SAN.
> What you have to deal, using currently available solutions, is with this
> kind of decisions:
> - Do you rate availability/fault tolerance as important?
> If you do (why else would you say PVFS is not for home dirs?) you must
> use a disk array based solution, either with an FC-SAN or NAS or iSCSI.
> Then, you must choose your "file system" (not for the NAS option). You'll
> have to decide if:
> - You need "POSIX locking": if you do, you can't use PVFS
> - You will want to support applications that both require high I/O
> bandwidth and heavy file sharing (RW the same file): if you do, you must
> exclude GFS, use GPFS in "data shipping mode" and modify your applications
> (Note: you can have a resilient PVFS configuration if you use a SAN with
> disk arrays instead of "internal" disks, and add some HA software - of
> course, you can "transfer" disks manually, via command scripts, if you do
> not want to use HA software)
> You also need to put the "high cost" of a SAN into context: if you want to
> move data at high speeds in a COTS (Gigabit Eth) LAN, you will consume all
> the available CPU (e.g. around 40% of a 2.6GHz Xeon to reach around 80MB/s
> sustained in one node). If you go for "fancy" interconnects (Infiniband,
> Myrinet,...) you are in the same "cost territory" as FC/SANs
> By NOT using "asymetrical" file systems (such as PVFS) and using "cluster
> file systems" such as GFS or GPFS you may (depending on your requirements)
> dispense with I/O nodes (client nodes on a SAN can directly access data)
> I have never been involved in a large configuration like the one you're
> planning to build, but I honestly think that you should go for a "mix" of
> HA filesystem (e.g., GFS) for homes, etc. (mostly unshared file access)
> and PVFS for the directories where files for HPC applications do live. I
> don't think there is a single, currently available file system, that can
> do both things well.
>> We are looking into designing a low cost, high performance storage system.
>> Requirements as below:
>> - Starts at 3TB, should scale up by adding more servers to say 10-12TB
>> - Use commodity technologies (x86_64, IB, GE, Linux), preferably all OSS
>> - Provide high I/O which scales with addition of storage nodes.
>> - To be used for hosting user home dirs so reliability is important
>> - The HPC cluster starts with 6 AMD64 nodes and is expected to scale to
>> 1000+nodes in a year.
>> - Preferably without FC/SAN
>> We do have experience with IBM GPFS, PVFS (1,2), NetApps, PolyServe but
>> not with GFS and LUSTRE.
>> PVFS is not reliable enough for home dirs (OK for scratch), GPFS cannot
>> do RAID5 like striping across nodes, needs SAN for RAID1 like
>> (cost $$$) , polyserve is too expensive (per CPU pricing)
>> Is GFS or Lustre suitable for the above needs? Any other commercial
> Paulo Afonso Lopes | Tel: +351- 21 294 8536
> Departamento de Informática | 294 8300 ext.10763
> Faculdade de Ciências e Tecnologia | Fax: +351- 21 294 8541
> Universidade Nova de Lisboa | e-mail: pal at di.fct.unl.pt
> 2829-516 Caparica, PORTUGAL
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Penguin Computing Corp.
mwill at penguincomputing.com
More information about the Beowulf