Tuesday, May 26, 2009

Storage system experiences

Some people dislike Storage Area Networks (SANs). This may be due to complexity; and it may be because SAN management is often delegated to a special group of administrators which may then become an organizational bottleneck.

Personally, I very much like SANs. I do not miss messing around with disks and cables and enclosures in the server room. In my opinion, the complexity of a SAN is easily balanced out by easy and timely allocation of storage chunks to servers. And as a database administrator, I'm very fond of being able to share RAIDs between several servers, resulting in storage backed by a large amount of spindles. Also, it's nice to be able to monitor I/O activity across a large set of systems.

At work, we use two different FC SANs. One SAN connects Windows/ESX/Linux Dell servers to a Hitachi AMS500 storage system, while another SAN connects IBM servers to IBM DS4800 controlled storage.

I've had 2½ years of experience with the IBM storage system which keeps 24TB of data. We've had the AMS500 for ½ a year; it stores around 20TB of data. Both storage systems are configured with RAID5/6 logical drives with varying speed profiles, for different kinds of use patterns.

GUI management interface
The GUI management interface for DS4800, Storage Manager (SM), is vastly superior to AMS500's Storage Manager Modular (SMM). SM lets you assign text lables to your LUs, while SMM only works with numeric LUNs; this means that SM lets you get away with less separately maintained systems documentation. Also, it's much easier to view snapshots of I/O activity in IBMs Storage Manager. And IBMs Storage Manager was very easy to install while Hitachi's Storage Manager Modular required a fair amount of tweaks during installation, due to Java runtime issues (seemingly because Hitachi's software is bundled with an ancient JRE).

Command line interface
For both storage systems, the command line tools use rather awkward syntax and calling conventions. I wish that the programmers of these tools would lean more towards modern Linux/unix command line conventions.

Health monitoring
For both storage systems, you are encouraged to install software which regularly checks storage system health; in case of trouble, it alerts you by e-mail and "phones home" to the respective support organizations. Again, the IBM software is easy to install, while we had problems getting the Hitachi software to install on contemporary server software.

The AMS500 lets you turn on SNMP, so that you can easily poll for the health of the system from your central monitoring system; nice. DS4800 doesn't seem to offer this.

During my 2½ years with the DS4800, we have had three serious breakdowns: Two related to firmware trouble, and one related to a controller hardware defect. This is too much, I think.
We haven't had experienced instability with the AMS500 during the ½ year that we've had it in production.

UPDATE Dec 2009: We got our first stability problem with AMS500 :-(
A misconfigured ESX-cluster generated a large number of I/O requests for a LUN which had been deleted, and the AMS500 started getting periodic absence seizures. In AMS500's defence: Had we been up-to-date with regard to AMS500 firmware, this wouldn't have caused trouble.

Benchmarking storage systems is hard, especially because you normally don't have the luxury of being able to shut down all other I/O than that generated by the benchmark. So although I've conducted an extensive set of performance measurements, I can only say that the systems seem to perform equally well.

One difference, though: Hitachi recommends that you stick with rather narrow RAIDs; if needed, these may then be joined into larger storage areas using a special LUSE feature, or using logical volume management at the hosts. This is somewhat annoying: We would generally like to have wide RAIDs comprising a large number of spindles; AMS500 makes it a bit more complicated to meet this goal.

At the time of our AMS500 acquisition, Hitachi's storage system was substantially less expensive than IBMs comparable offerings. The comparison may be a bit unfair, though, because our procurement is tightly controlled by government procurement contracts.

We haven't used IBMs support offerings much, because we have historically used consultants from an IBM partner for support of our DS4800. But, regarding software and documentation, IBMs support is mostly very good: It's easy to download updates for both firmware, management software, and documentation. IBMs documentation of how to use DS4800 multipathing with Linux is inconsistent and confusing, though.

Hitachi's support is a mixed experience. They have appointed a technician from their Danish office to us, and this works very well: He is easy to get in touch with, and he provides good answers. On the other hand, Hitachi's distribution of software and documentation is miserable: The software is hidden behind a confusing extranet, and even after months of mail/phone correspondance with the extranet support (and other parts of Hitachi's organization), we haven't been able to log in and download software or documentation. So we have resorted to ask our tech contact to send us CDs via snail mail once in a while. Another Hitachi annoyance is licence keys: Why on Earth do we need to enter licence keys when installing multipath driver software (on Windows); as if there would ever be a black market for that kind of software. And when the going gets tough in operations (e.g. if a new server needs to be quickly installed after a server breakdown), it's frustrating to have to spend time trying to dig out that CD with licence keys. Argh! What are they thinking?

Multipathing with the DS4800 works well with Windows and PowerVM servers, but we never got it to work perfect with Linux on Intel. It's strange that IBM puts a significant amount of work into Linux, but still can't make it simple and easy to integrate an important storage product with Linux.

Multipathing with AMS500 works well (and out of the box) on Linux and ESX if you follow certain conventions (Red Hat knowlegebase article; Hitachi ESX configuration document). However, Hitachi's Windows HDLM drivers have trouble discovering new LUs, resulting in the need for several reboots when a new LU is mapped to a Windows host; when the discovery is up and running, things work fine with Windows, too.

Overall experiences
Pro IBM: Good management software, simple distribution of software and documentation.
Con IBM: Hich price. More breakdowns than should be expected, I think.

Pro Hitachi: Good price, good stability so far.
Con Hitachi: Bad management software, very bad distribution of software and documentation.