[Beowulf] File server dual opteron suggestions?
jmdavis1 at vcu.edu
Fri Aug 4 00:04:50 PDT 2006
I don't mean to hijack the thread, but if Dave's users can fit the db's
that they are running (Blast for instance against) in /tmp on the
compute nodes, overall performance increases. This certainly doesn't
work with genbank (unless you have 130+gb of /tmp. But it does work well
with nr, uniprot, and the other protein db's.I run a relatively large
/tmp filesystems on my nodes (55-100GB). But my nodes are more general
purpose and may be running blast one day, Gaussian 03 or VASP the next,
and Fluent or abaqus after that.
The performance increase will depend on the size of the db, the size of
client and server caches, and the number of spindles.
Joe Landman wrote:
> Mark Hahn wrote:
>>> I would recommend upping the memory. Computing or not, large buffer
>>> caches on file servers are with very rare exception, a preferred
>> unclear. the FS's memory does act as an excellent cache, but then
>> the client memory does too. do you have a pattern of file accesses
>> in which
>> the same files are frequently re-read and would fit in memory? the
>> I've looked at closely have had mostly write and attribute activity,
>> since the client's own cache already has a high hit-rate. for
>> writes, of
>> course, more FS memory is not important unless you have extremely high
> I was actually assuming read-dominated. Dave does informatics as I
> remember, and most of the informatics we have dealt with tends to be
> read dominated. Doesn't mean much though without the workload info
> though. So I agree with the caution, though I humbly note that a 1GB
> stick costs about 120$ +/- a bit these days. Eg, it is not a large
> price, and the potential impact on performance is much higher than for
> 10k RPM drives.
> FWIW I have a pair of 10k RPM SATA raptors and I am not all that
> impressed with them.
>> bandwidth net and disks. in fact, I've been using the following
>> # delay writing dirty blocks hoping to collect further writes
>> (default 30s)
>> vm.dirty_expire_centisecs = 1000
>> # try writing back every 1s (default 500=5s)
>> vm.dirty_writeback_centisecs = 100
>> in short, don't bother working at write caching much. with a lot of
>> an untuned machine will exhibit unpleasant oscillations of delaying
>> then frantically flushing.
> Yup. I had my dirty around 250 for a long time. Write caching is
> harder because if you really want to play it safe, you shouldn't cache
> the write ...
>>> 2Gb/socket minimum. Nothing serves files faster than having them
>>> already sitting in ram.
>> true, but is that actually your working set size? it would be rather
>> embarassing if 3 of the 4 GB were files read once a month...
> Hmmm... again, this is a good workload problem. If Dave's users are
> going through big "databases" from NCBI, lots of ram is a good thing.
> It it is just a buncha small files, yeah, could be overkill.
> But if I had to spend extra $$ on ram versus 10kRPM drives, I know
> where I would spend it ...
>>>> 4 x 74 Gb disks Ultra320 (or make an argument for a particular SATA)
>> SATA disks are SATA disks, of course. dumb controllers are all pretty
>> similar as well (cheap, fast, not-cpu-consuming). if you have your
>> heart set on HW raid, at least get a 3ware 9550, which is quite fast.
>> (most other HW raid are surprisingly bad.)
> The LSI SAS unit is pretty good. I like the 3ware, the Areca, and a
> few others. We just created a nice 500+ MB/s "file server" for a
> large customer out of an Areca card, 16 spindles and some tweaking. I
> haven't seen production performance data for it yet, but our in house
> testing exceeded the 500 MB/s by a little bit.
>>>> dual 10/100/1000 ethernet on the mobo
>>> Careful on this... we and our customers have been badly bitten by
>>> tg3 and broadcom NICs. If the MB doesn't have Intel NICs, get an
>>> Intel 1000/MT dual gigabit card. You won't regret that, and it is
>>> money well spent.
>> that's odd; I have quite a few of both tg3 and bcm nics, and can't
>> say I've had any complaints. what are the problems?
> Interrupted to death. The tg3 doesn't seem to have NAPI turned on by
> default in the standard distro kernels. Haven't tried the FC* with
> this, hopefully it is saner there. Under heavy load, we see
> interrupts climb past 40k/s, and it context switches like mad. Seen
> this from early 2.6 through 2.6.13 on SuSE and RHEL. Makes using AOE
> (Coraid) nearly useless with Broadcom, formatting the unit with ext3
> renders the server unusable for hours. Drop a nice Intel unit in
> there, do the same thing and it works great, server is responsive
> during formatting. Same issues for file service and heavy load.
> Seen this on Tyan, iWill, Arima?, MSI(ibm e32*), and others.
>>>> case - 2U (big enough for adequate ventilation, right?)
>>> Yeah, just make sure you have good airflow.
>> 2U still requires a custom PS, doesn't it? it's kind of nice to be
>> able to put in an ATX-ish PS. and is 2U tall enough for stock/standard
> Don't know if it is custom. I like the redundant PS, but the small
> redundant PSes tend not to supply enough current to boot the system.
> Need a 3U case for that.
> Best cooling designs I have seen involve baffles, and a pull or
> push-pull config. We have used some units where under load the
> processors are happily working around 22-28C. Fans are loud though.
> Case (1U) is very cool to the touch.
> For 2U you still need to worry about flow. I find it hard to believe
> that most people get efficient flow out the back grating on 2U and
> larger without a helper fan of some sort.
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf