Thursday, July 30, 2015

Have You Considered Centralized Storage?

If you’re seeking a centralized storage device that provides traditional block-level storage for your physical or virtual servers, while simultaneously providing a central platform for file-level data storage, then you’re looking for a Unified Storage device.  Traditionally, these devices do not contain atypical hardware that you wouldn’t already find in a block-level SAN, they are SANs with a NAS software feature.  One way to loosely conceptualize the difference between a NAS and a SAN is that NAS appears to a client operating system as a file server (a client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, which is visible in disk and volume management utilities (along with client's local disks), and is available to be formatted with a file system and mounted. The connectivity protocols for a SAN include Fibre Channel, iSCSI, whereas the popular NAS connectivity protocols used are NFS and CIFS.  Unified Storage replaces file servers and consolidates data for applications and virtual servers onto a single platform.

NAS-specific storage devices are plentiful in the market, and many offer large, inexpensive, and redundant disk capacities and features usually found in a block-level SAN.  Vendor examples include HP’s StoreEasy product family, which uses Microsoft Windows as a single-solution OS paired with a RAID array, and a crowded middle-market of NFS Linux-based disk devices that include Synology, Transporter, and QNAP.   However, many of these products do not scale beyond their initial capacity, and can present too much of a failure risk for an Enterprise network. Unified storage can provide the cost savings and simplicity of consolidating storage over an existing network, the efficiency of tiered storage, and the flexibility required by virtual server environments.


Potential cost savings attract IT buyers to Unified Storage devices, because, while not every network requires a SAN, most networks have some flavor of a NAS concept implemented.   End users need a shared and central location for storing heaps of documents and other data, and a common way of implementation is with Microsoft-based File Shares (accessed via the legacy SMB protocol or more commonly CIFS).   As networks grow and age, these file share caches usually never shrink in size, which not only causes daily management consternation for the IT Admins, but also increases the vulnerability and importance of the network’s most-coveted data.  File data has to be placed somewhere – why not give it the same efficiency and high availability features of a SAN?  
Most of the popular Enterprise storage vendors have a Unified option.  EMC’s current VNX-Series Generation can use dedicated RAM and CPU resources for Data Movers that can control CIFS traffic to dedicated LUNs.  In 2002, NetApp added block-level capabilities to their popular Filer series, taking the reverse route to get to a Unified option.   3PAR StorServ, now owned by HP, offers the File Persona Software Suite to complement their Block Persona technology. 
Management of Unified Storage is not complex. Hardware competition benefits buyers by having a single pane-of-glass interface for most NAS-functionality paired right along with day-to-day LUN provisioning for SANs.  Security and Access integration with Microsoft Active Directory file permissions can get complex, but only for the most rigorous of Security initiatives. 


Counter arguments to Unified Storage systems usually hover around perceived compromises in performance, because File-based I/O is structurally different that block-based I/O.   If a block-based application is combined on a system that has more dynamic, file-based access, users can experience variability in performance because of resources allocated to the file-based side.  Consistency in disk performance is paramount for many environments, and Unified Storage can be considered a potential risk.

Many IT Departments can see Unified Storage devices as unnecessary, as most already have Windows Server Virtual Machines providing Windows CIFS-based File Share vdisks on their existing block storage LUNs.  The vdisks can scale easily, because NTFS-based volumes can potentially grow to 16TB.  Most Windows-based networks also use Microsoft’s Group Policy to centrally manage File Share access, and adding another abstractive layer with a NAS device can create complexity where it is not desired.

With a Unified solution, administrators need to manage SAN and NAS storage as separate silos.  This complicates administration, forcing them to predict future storage needs for each silo and manage requirements separately.  Since File-Level storage is traditionally placed on less-expensive SATA disk, the Unified Solution will eventually be constrained to the limitations of the entire solution and rapidly accelerate a Storage refresh cycle. 

Final Thoughts

Realistically, most Unified Storage Systems are better at one capability than the other.  This means they are a NAS that figured out a way to provide block storage (NetApp), or they are block storage with some sort of NAS function integrated (EMC).  Although most environments will use a mixture of workloads, a particular workload will often be the most important. Make sure that you test the specific conditions and configurations that will be most important to you.

Thursday, July 23, 2015

Avoiding the ‘Jack of All Trades, Master of None’ Approach

As a member of a professional services group within an IT sales organization, my team’s focus is on evaluating our customer’s business problems and engineering solutions in the form of products and services we offer in order to fix the issue.  We are, by definition, “problem solvers.”  This reminds me of the old adage, “there is more than one way to skin a cat.”  Well Mr. Customer, there is more than one way to solve your storage issue, more than one way to clear up that excessive network traffic, and more than one way to leverage virtualization to increase consolidation and availability.

Between application and operating system software, networking, compute, and storage hardware, there are hundreds of thousands of products one can choose from.  How can we be experts in all of them?  Quite simply, we can’t, you can’t, and no one organization can, without employing hundreds of people to match the product counts. It’s just not feasible in most IT sales organizations, which are typically small to medium sized businesses. If they aren’t careful, this leads to simply having employees who have about kiddie pool’s depth in knowledge about a vast number of technologies instead of specializing in a select few.

Since we want to avoid this “jack of all trades” approach, we have chosen to be diligent in our selection of the hardware and software products that we recommend to business owners so that we can “master” those that we do offer.  As a sales team, we must be able to show the value or ROI of the purchase, implement the solutions in professional services roles, and support the customer post-sale with knowledge and possibly more professional services as the environment grows.  As an organization, we must maintain strong relationships with the manufacturer’s technical resources to overcome problems that will most certainly arise.

Our engineers spend their hours on training, using, patching, identifying bugs as well as fixes, and vetting the products so that we can solve problems before or as they occur.  We choose to hone our skills on a concise portfolio because we know the dangers of trying to be everything to everyone.  Even with all our focused efforts, do we know it all?  Not a chance, but we will also work for you to find answers when we don’t.  While there are many similarities we can assimilate, every environment our products go into will have its own priorities, flaws, designs, and problems.

Outside of buying the tangible product, we find that many organizations struggle because they set out to install new applications or hardware without sufficiently planning for the intangible side: Implementation!  Likely your pre-planning includes how your own IT staff will manage the products in your environment once they are installed, but does that mean they can implement without any experience or manage it without training?  Just like engineers, IT staff members can’t possibly know how to use every product either.  There are not enough hours in the day.   The actual cost to a business to fix a botched implementation, not only in manpower but also in lost productivity or downtime, far exceeds the cost to plan and include the services of expert users ahead of time. Using experienced engineers in an onsite or remote fashion for best practices and “I learned it the hard way” advice to your staff is valuable training that you won’t get from a pre-installation checklist.

Great Lakes Computer is proud to offer our customers professional services, including, but not limited to the following products in our portfolio: VMware, Microsoft, Juniper Networks, Palo Alto Networks, Aruba Networks, Cisco, Fortinet, Nimble Storage, Pure Storage, Veeam, and Unitrends solutions.   Please consider engaging our engineering staff on the front end, in tandem with your IT administrators, or as a supplement to a short staff that might not have the resources to get the project up and running to meet deadlines.   We truly are a partnership that can springboard your next implementation to success!

Thursday, July 16, 2015

Dual Controllers, Single Point of Failure?

We’ve all (hopefully) heard the term before, “Single Point of Failure.”  This phrase strikes fear in the hearts of people in management, the idea that if this one important resource has issues, then everything dependent on it fails.  It’s the weak link, the gremlin in your environment, and according to our old friend Murphy, “Anything that can go wrong will go wrong.”  And you better believe this SPOF gremlin is going to rear its ugly head at the most opportune and painful time – just ask any veteran IT professional.

So how do we combat these SPOF gremlins?  We build in redundancy, we limit failure domains, we vigilantly monitor our environments, and alert on any changes or anomalies.  So when failures do occur, we have either an automatic failover or near immediate solution that will keep our users happily clicking away.

So let’s apply this to the topic of storage, specifically a storage array.  Forget the network connections to the array for now; let’s hone in on the modern storage array chassis itself.  They are often equipped with multiple network connections, power supplies, disks, processors, memory banks, etc.  “We have dual controllers, everything is mirrored the instant it is brought into the array, so this is not a single point of failure.”

So are they correct?  Will a dual controller storage array be able to keep the SPOF gremlins at bay?  I wish I could give you a conclusive answer, because I suspect that some storage manufacturers are nearing the point where the odds of a failure bringing an entire dual controller array down is comical.  But let’s ponder this…and I’m speaking from a painful past experience here. The operating system that runs the array, what is protecting you from failures within that?  “We have the best engineers in the industry,”  “We run our revisions through rigorous tests to ensure stability,” and “We guarantee 99.999% uptime.”

Interestingly enough, five-nines of reliability still allows for up to 5 minutes and fifteen seconds or less of downtime a year.  Think of the damage a SPOF gremlin could do in that amount of time – yeah, it will be painful and likely take longer than 5 minutes to fully recover.

So what do we do?  Well, if you have high tier workloads that require constant uptime, then it’s probably a good idea to look at replica technology.  Storage arrays often have some sort of storage replication built within them as a feature.  If that doesn’t work out, there are multiple applications and features built into services that will provide a similar solution.

My best advice is: continue to be vigilant with your monitoring and don’t let your guard down.  Those gremlins are out there somewhere, and when they strike, you need to be ready.  Let us help with planning your defenses and maintaining your uptime goals.  We have the expertise to identify the single points of failure (they can be very sneaky) and how to combat them.  After all, if you take on the gremlins yourself, could you be considered a SPOF?

Thursday, July 9, 2015

Troubleshooting Performance Bottlenecks with Per-VM Monitoring

Nimble Storage has offered Per-VM monitoring within InfoSight since April, included without requiring the purchase of an additional option or component. All that is required to enable Per-VM monitoring is to register your Nimble Storage array with vCenter (Administration > vCenter Plugin using the array management interface if you have not already) and enable Stream Data in InfoSight (Administration > Virtual Environment in InfoSight). Per-VM monitoring, or Virtual Environment, can be found under the Manage menu item in InfoSight.

The first thing you will notice is an inventory tree on the left with icons for Hosts and Clusters, Virtual Machines, and Storage.

Next, you will notice the content section with headers for Host Activity, Top VMs, Datastore Treemap, Inactive VMs, and Nimble Arrays.
  • Host Activity provides a list of your vSphere hosts and their recent performance metrics
  • Top VMs lists the ten busiest virtual machines over the past 24 hours by I/O and latency
  • Datastore Treemap displays heat maps to compare the performance of virtual machines
  • Inactive VMs lists all virtual machines that have not generated any I/O in the past seven days
  • Nimble Arrays provides a list of Nimble Storage arrays registered with vCenter
All of the reports are pretty self-explanatory, but Datastore Treemap may be the most unique and beneficial of the bunch. The heat map design sizes virtual machines by total I/O, then colors the unit based on observed latency and groups virtual machines by datastore.

Each square represents a virtual machine. This enables us to see which virtual machines are producing the most I/O and easily compare them to the other virtual machines with which they share a datastore. The more red the square, the higher the average latency; hovering the cursor over a square displays a popover with the detailed figures, and clicking on the virtual machine name in the popover will provide the historical performance details of the specific virtual machine.

Now we can adjust the timeframe to do something like narrow down to a time of reported slowness. In this example, we see that the primary factor for the spikes in latency is network bottlenecks. As we look at the spikes, we also notice that they always occur on a Saturday - which also happens to be the day that we perform full backups of our environment.

Below the graph of Virtual Machine Latency, we also see graphs for: Host Performance, Datastore Analysis, and Active Neighbor Analysis.

Datastore Analysis

Active Neighbor Analysis