For some, it may be as easy as scanning Gartner’s Magic box for this year’s top performers or the breakthrough up and comers. What about transport methods: will you favor iSCSI, Fiber Channel, Fiber Channel over Ethernet, or SAS? Not to mention you must have solid state disks drives; it’s the hot technology, everyone is using it, and you don’t want fall behind!
If, on the other hand, you are truly interested in investing in only what you need to get the job done efficiently, then the key to selecting the right storage should start with intimately knowing the I/O patterns of the applications that will access the data. The most important metrics to consider are reads, writes, and I/O size. Collecting data can be done through various performance programs such as perfmon for Windows or htop for Linux. These two are among a crowd and everyone has their favorite. Specifically, you will want to measure during all workloads; peak and off-peak workloads may have different I/O characteristics. Attention should be given to the following points: Disk Reads/sec, Disk Writes/sec, Disk Latency, Size of I/Os being issued, and Disk Queue length. If you are analyzing a database, also include Checkpoint pages/sec and Page Reads/sec.
Once you have a solid idea of how your applications perform, then you can move onto sizing the physical disks. Typical IOPS per spindle range from 100-130 on 10K RPM drives, 150-180 on 15K RPM drives, and 5000+ on solid state drives. Keep in mind that disks with less than 50-70% written capacity will have improved IOPS over disks written at 80% capacity, so spread the data out! You will also want to consider the impact of write penalty your RAID choice will have on I/O issued against the disks. Read and Write Cache on storage controllers can offer improvement to the performance and should be taken into consideration if the applications lean heavily one way or the other.
Sizing storage systems is much more than just sizing the disks. Every component in the path to the physical drives has a throughput limit and can be a potential bottleneck. As more solid state disks are implemented, the configuration of these ancillary components will become more critical. To avoid pitfalls, analyze the potential throughput of each of the following:
- Connectivity: HBAs, NICs, switch ports (and if they are shared by multiple servers), array ports, and the number of paths between servers and storage.
- The number of service processors in the array and how the LUNs are balanced across them.
- The capacity of the backend busses on the array and how the physical disks are balanced across them.
Other considerations that can affect sizing decisions are advanced features offered by today’s latest generation of storage arrays. Examples are thin provisioning, snapshots/clones, compression, deduplication, and storage based replication. In addition to these, some new generation arrays utilize technology that throws all those media IOPS estimates out the window!
The moral of the story is this: arming yourself with in-depth knowledge of your application performance gives you the ability to quantify different array features and purchase only what you really need. Arrays that tout doing everything come with a hefty price tag. If they don’t benefit your applications, they are worthless, and saving money where it makes sense has never been a bad business decision.