[Beowulf] zfs tuning for HJPC/cluster workloads?

Loic Tortay tortay at cc.in2p3.fr
Sun Jul 6 12:17:12 PDT 2008


Joe Landman wrote:
> 
>   Investigating zfs on a Solaris 10 5/08 loaded JackRabbit for a 
> customer.  zfs performance isn't that good relative to Linux on this 
> same hardware (literally a reboot between the two environments)
> 
>   I am looking for ways to tune zfs, or even Solaris so we can hopefully 
> get to parity with Linux (less than 50% of Linux performance now c.f. 
> http://scalability.org/?p=640 ).  What I have found online has been
>     http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> 
>   and a number of others.  Even the dangerous tuning methods mentioned 
> in this document (turning off Zil), don't really help all that much.
> 
>   This is for an IO intensive application with multiple threads doing 
> 1-100 GB streaming reads.
> 
>   Offline from others I have heard similar issues, so if someone knows 
> how to tweak the OS or FS to get good performance, please fire me over a 
> pointer ... I would appreciate it !
> 
We have seen the same issue on (non Sun) high density storage servers which 
performed correctly with RHEL5 & XFS but comparatively poorly with Solaris 
10 & ZFS.

ZFS seems to be extremely sensitive to the quality/behaviour of the driver 
for the HBA or RAID/disk controller, especially with SATA disks (for NCQ 
support).  Having a driver is not enough, a good one is required.

Another point is that ZFS requires a different configuration "mindset" than
"ordinary" RAID.
Have you noticed the "small vdev" advice on the Solaris Internals Wiki ?
This is probably the single most important hint for ZFS configuration.
IOW, most of the time you can't just use the same underlying configuration 
with ZFS as the one you (would) use with Linux.
This means that you may need to trade usable space for performance,
sometimes in more drastic ways than with ordinary RAID.

Finally, like it or not, ZFS is often more happy/efficient when it does the 
RAID itself (no "hardware" RAID controller or LVM involved).


Loïc.

PS: regarding your other message in this thread (and your blog), you seem 
confused: the "open source" OS is OpenSolaris, not Solaris 10.
The benchmark publishing restriction only applies to Solaris 10 (see 
<http://www.opensolaris.com/licensing/opensolaris_license/>).
PPS: while I dislike Sun's policy, I specifically remember being told by 
someone from a DOE lab (who did actually evaluate your product about 18 
months ago) that you didn't want their unfavorable benchmarks results to be 
published.  You can't have it both ways.
-- 
|    Loïc Tortay <tortay at cc.in2p3.fr> -     IN2P3 Computing Centre     |



More information about the Beowulf mailing list