I've seen a HP "LeftHand" / StorageWorkd P4000 SAN device recently and got quite good impressions off of it. One thing that occured to me is - why didn't anyone try this before? Certainly both Linux (to lesser extent) and FreeBSD (to a somewhat greater) contain the pieces for it, and have contained for some years now. In fact, several people did such setups privately or internally for their companies but there was apparently never a concentrated effort to sell it.
What LeftHand is, is a "network RAID" SAN product, built around Linux with some custom internal software, on completely commodity / COTS Proliant hardware. It basically offers iSCSI storage with redundancy built over the network (Ethernet) to multiple servers. Each box is a separate, complete server, containing an arbitrary setup of drives in a RAID volume. Then, multiple boxes are combined together in a RAID-like setup (becoming RAIS - Redundant Array of Inexpensive Servers) offering SAN volumes with a desired RAID level using the servers as lower-level storage. An example setup might consist of three boxes, each with 8 drives in RAID-5, exporting three volumes: one RAID-5 volume spanning the three servers (in effect making this a RAID-55 setup), one RAID-1 volume spanning two servers (RAID-15), and one volume served from only one server.
This has been possible in FreeBSD at least since ZFS was imported, around 3 years ago, but can also be achieved with "lesser" file systems, a volume manager and software RAID. Here is how the example setup could be achieved:
- Configure three individual servers (s1, s2, s3) with some drives; in each server, make a single large ZFS RAID volume from all the drives, or use a hardware controller and create a simple ZFS volume on it (boot from internal USB key if bootability or operating system disk or space is an issue - lots of modern servers have internal USB for things like this and VMWare).
- Plan your end-layout. Let's say each server holds 10 TB of user-available storage and we want to use 6 TB from each server to create the big RAID-Z volume, and the rest will go into either the RAID-1 volume or the "plain" volume.
- Use "zfs create -V" to create one 6 TB zvol and one 4 TB zvol on each server.
- Export these volumes via iSCSI, using ports/net/istgt or via ggated(8).
- Plan which nodes will be "head" for each volume. You can also introduce a new "head node" which will only import the iSCSI nodes, but this could become a bottleneck. Let's say that s1 will be head for the RAID-Z, s2 for RAID-1 and s3 for the plain 4 TB volume.
- On s1, import the other two 6 TB zvols via iSCSI with iscsi_initiator(4), on s2 import the one other 4 TB volume from s1, and on s3 do nothing in this step.
- On s1, create a new RAID-Z volume from one local 6 TB zvol and two iSCSI-imported ones, on s2 create a new RAID-1 ZFS volume from the one local 4 TB zvol and the one iSCSI-imported zvol, and on s3 just use the previously created ZVOL.
- You can now use all of the created storage devices however you want. In case of LeftHand, the end-result is again exported over iSCSI, but you can simply create a file system on the end-volumes and use them locally. Thinking on it in retrospective, you could probably shave quite a few heavy layers by using ZFS only for the end-volumes and using hardware RAID to get the volumes from step 3).
Why would someone use such a setup, especially considering it is considerably more complex than just using a simple DAS or SAN storage with a single level of RAID? First and foremost, it's a cheap way to introduce multi-server storage redundancy, while also increasing space. If you use ZFS on the end-result volumes you can automagically extend storage space by adding more boxes. With some fancy scripting, hot failover can be implemented.
Of course, Ethernet speed is an issue. A setup like this will only work good with either 10 Gbit NICs, or carefully planned network setup with multiple 1 Gbit NICs (which is the way the low-end LeftHand models work).
Why would you buy LeftHand when a setup like this can be done with FreeBSD (and even saner Linuxen)? Because the LeftHand product has a GUI (albeit a wrongly managed one - written in Java but with native installers and requiring its own bundled micro-version of Java not a generic one) which condenses all these steps in a few mouse clicks.
(On a tangential topic, ZFS v28 is ready for testing! It brings deduplication, RAIDZ3, removing devices from log volumes and more!)