Troels' blog: SAN

Fibre channel (FC) storage is handy, and generally very reliable, in my experience. I certainly do not miss the days of messing around with disks in a server-room. And I like the fact that RAIDs may be cut up into slices (LUNs) which may be shared by many servers, resulting in very efficient use of the disks (if so wanted).

One part about FC that I dislike (in addition to the price tags): SFPs. Why on earth are transceivers not an integral part of a Fibre Channel switch? Having the transceivers be separate units means more electrical contact points, and a potential support mess (it's not hard to imagine a situation where the support contract of an SFP has run out, while the switch itself is still covered).

Anyway: Today, I experienced an defunct SFP, for the first time. The following observations may give a hint of how to discover that an SFP is starting to malfunction. The setup is an IBM DS4800 storage system where port 2 on controller B is connected to port 0 on an IBM TotalStorage SAN32B FC switch (which is an IBM-branded Brocade 5100 switch).

Friday morning at 07:49, in syslog: A few messages like this from the FC switch:

raslogd: 2011/09/30-07:49:07, [SNMP-1008], 2113, WWN 10:... | FID 128, INFO, IBM_2005_B32_B,  The last device change happened at : Fri Sep 30 07:49:01 2011

At the same time the storage system started complaining about "Drive not on preferred path due to ADT/RDAC failover", meaning that at least one server had started using a non-optimal path, most likely due to I/O timeouts on the preferred path. And a first spike in the bad_os count occurred for the FC switch port:

bad_os is a counter which exists in Brocade switches, and possibly others. Brocade describes it as the number of invalid ordered sets received.

At 10:55, in syslog:

raslogd: 2011/09/30-10:55:02, [FW-1424], 2118, WWN 10:... | FID 128, WARNING, IBM_2005_B32_B, Switch status changed from HEALTHY to MARGINAL

At the same time, there was a slightly larger spike in the bad_os graph.
Coinciding: The storage system sent a mail warning about "Data rate negotiation failed" for the port.

At 17:00: The count for bit-traffic flat-lined (graph not shown). I.e.: All traffic had ceased.

At no point did the graphs for C3 discards, encoding errors or CRC errors show any spikes.

The next morning, the involved optical cable was switched; that didn't help. Inserting another SFP helped, leading to the conclusion that the old SFP had started to malfunction.

Morale: Make sure to not just keep spare cables around. A spare SFP should also be kept in stock.
And monitor your systems: A centralized and regularly inspected syslog is invaluable. Generating graphs for key counters is also mandatory for mature systems operation; one way to collect and display key counts for Brocade switches is to use Munin and a Munin plugin which I wrote.

PS: Brocade documentation states that SFP problems might result in the combination of rises in CRC errors and encoding/disparity errors. This did not happen in this situation.

Background
Some people dislike Storage Area Networks (SANs). This may be due to complexity; and it may be because SAN management is often delegated to a special group of administrators which may then become an organizational bottleneck.

Personally, I very much like SANs. I do not miss messing around with disks and cables and enclosures in the server room. In my opinion, the complexity of a SAN is easily balanced out by easy and timely allocation of storage chunks to servers. And as a database administrator, I'm very fond of being able to share RAIDs between several servers, resulting in storage backed by a large amount of spindles. Also, it's nice to be able to monitor I/O activity across a large set of systems.

At work, we use two different FC SANs. One SAN connects Windows/ESX/Linux Dell servers to a Hitachi AMS500 storage system, while another SAN connects IBM servers to IBM DS4800 controlled storage.

I've had 2½ years of experience with the IBM storage system which keeps 24TB of data. We've had the AMS500 for ½ a year; it stores around 20TB of data. Both storage systems are configured with RAID5/6 logical drives with varying speed profiles, for different kinds of use patterns.

GUI management interface
The GUI management interface for DS4800, Storage Manager (SM), is vastly superior to AMS500's Storage Manager Modular (SMM). SM lets you assign text lables to your LUs, while SMM only works with numeric LUNs; this means that SM lets you get away with less separately maintained systems documentation. Also, it's much easier to view snapshots of I/O activity in IBMs Storage Manager. And IBMs Storage Manager was very easy to install while Hitachi's Storage Manager Modular required a fair amount of tweaks during installation, due to Java runtime issues (seemingly because Hitachi's software is bundled with an ancient JRE).

Command line interface
For both storage systems, the command line tools use rather awkward syntax and calling conventions. I wish that the programmers of these tools would lean more towards modern Linux/unix command line conventions.

Health monitoring
For both storage systems, you are encouraged to install software which regularly checks storage system health; in case of trouble, it alerts you by e-mail and "phones home" to the respective support organizations. Again, the IBM software is easy to install, while we had problems getting the Hitachi software to install on contemporary server software.

The AMS500 lets you turn on SNMP, so that you can easily poll for the health of the system from your central monitoring system; nice. DS4800 doesn't seem to offer this.

Stability
During my 2½ years with the DS4800, we have had three serious breakdowns: Two related to firmware trouble, and one related to a controller hardware defect. This is too much, I think.
We haven't had experienced instability with the AMS500 during the ½ year that we've had it in production.

UPDATE Dec 2009: We got our first stability problem with AMS500 :-(
A misconfigured ESX-cluster generated a large number of I/O requests for a LUN which had been deleted, and the AMS500 started getting periodic absence seizures. In AMS500's defence: Had we been up-to-date with regard to AMS500 firmware, this wouldn't have caused trouble.

Performance
Benchmarking storage systems is hard, especially because you normally don't have the luxury of being able to shut down all other I/O than that generated by the benchmark. So although I've conducted an extensive set of performance measurements, I can only say that the systems seem to perform equally well.

One difference, though: Hitachi recommends that you stick with rather narrow RAIDs; if needed, these may then be joined into larger storage areas using a special LUSE feature, or using logical volume management at the hosts. This is somewhat annoying: We would generally like to have wide RAIDs comprising a large number of spindles; AMS500 makes it a bit more complicated to meet this goal.

Price
At the time of our AMS500 acquisition, Hitachi's storage system was substantially less expensive than IBMs comparable offerings. The comparison may be a bit unfair, though, because our procurement is tightly controlled by government procurement contracts.

Support
We haven't used IBMs support offerings much, because we have historically used consultants from an IBM partner for support of our DS4800. But, regarding software and documentation, IBMs support is mostly very good: It's easy to download updates for both firmware, management software, and documentation. IBMs documentation of how to use DS4800 multipathing with Linux is inconsistent and confusing, though.

Hitachi's support is a mixed experience. They have appointed a technician from their Danish office to us, and this works very well: He is easy to get in touch with, and he provides good answers. On the other hand, Hitachi's distribution of software and documentation is miserable: The software is hidden behind a confusing extranet, and even after months of mail/phone correspondance with the extranet support (and other parts of Hitachi's organization), we haven't been able to log in and download software or documentation. So we have resorted to ask our tech contact to send us CDs via snail mail once in a while. Another Hitachi annoyance is licence keys: Why on Earth do we need to enter licence keys when installing multipath driver software (on Windows); as if there would ever be a black market for that kind of software. And when the going gets tough in operations (e.g. if a new server needs to be quickly installed after a server breakdown), it's frustrating to have to spend time trying to dig out that CD with licence keys. Argh! What are they thinking?

Multipathing
Multipathing with the DS4800 works well with Windows and PowerVM servers, but we never got it to work perfect with Linux on Intel. It's strange that IBM puts a significant amount of work into Linux, but still can't make it simple and easy to integrate an important storage product with Linux.

Multipathing with AMS500 works well (and out of the box) on Linux and ESX if you follow certain conventions (Red Hat knowlegebase article; Hitachi ESX configuration document). However, Hitachi's Windows HDLM drivers have trouble discovering new LUs, resulting in the need for several reboots when a new LU is mapped to a Windows host; when the discovery is up and running, things work fine with Windows, too.

Overall experiences
Pro IBM: Good management software, simple distribution of software and documentation.
Con IBM: Hich price. More breakdowns than should be expected, I think.

Pro Hitachi: Good price, good stability so far.
Con Hitachi: Bad management software, very bad distribution of software and documentation.

Troels' blog

Saturday, October 01, 2011

What a dying SFP looks like

Tuesday, May 26, 2009

Storage system experiences

Tags

Blog Archive