1015 Improving Storage Infrastructure Utilization
1015 Improving Storage Infrastructure Utilization
Utilization
April 2007
Introduction
Endless capital requests for additional storage resources are the bane of existence for many
storage directors and infrastructure VPs. The seemingly unquenchable demand for more space,
combined with the nagging feeling that existing space is not being used, has triggered many an
internal project aimed at examining storage utilization issues. All too often, these projects
attempt to chase the metric of filelevel utilization. Yet unfortunately, this may not be the right
approach to solving utilization problems. Especially for large, complex storage environments,
attempting to track file utilization on a global basis may be doomed to failure. This white paper
proposes a more effective approach to solving the utilization challenge in complex SANs.
Why Underutilization
What leads application owners to overrequest storage? Through experience helping corporations
gain control over networked storage assets, Onaro® has observed the following reasons:
· Fear of the 2:00 AM page – It only takes one experience of being paged in the middle of
the night due to insufficient database storage for a DBA to forever overrequest space.
· Uncertainty of actual demand – Many application teams do not have a clear idea of
exactly how much space their application will require. Factor in transient storage
demands and the picture gets cloudier.
· Lack of a service level for provisioning time – Without an agreedupon time frame to
provision new storage, application owners are uncomfortable with an openended
schedule for getting new storage in a crisis.. Without this service level, storage
provisioning times are uneven at best. To overcompensate for this, teams overrequest to
compensate for the unpredictable lead time.
· Financial incentive to conservatively estimate storage requirements is outweighed by
career disincentive of running out of space – Without a simple cost allocation system, it
is always easier to overrequest space.
· Space request multiplication – Space request formulas start with the application team
requesting “x” amount of space. Then the system administrator doubles that amount to
avoid being awakened in the middle of the night due to insufficient storage. Finally, the
storage team adds another 2030% to prevent having to scramble to add more space due
to the application team’s underestimation of requirements.
· Uncertainty over actual loads in the SAN fabric and arrays – Without the ability to
understand exactly how applications are loading the SAN fabric and arrays, many
storage teams hesitate to push the envelope on both port and array utilization. The result
is underutilization of switch ports and arrays that are not fully allocated.
The False Promise of FileLevel Utilization
To solve the storage utilization challenge, many organizations are attempting to monitor file
level utilization. In Onaro’s opinion, once the size of a SAN exceeds a few hundred ports, the
time, effort and cost of tracking filelevel utilization across the entire environment makes this an
impractical and fruitless exercise that fails to solve the core utilization problems. Why?
First, most of the utilization problem is a human behavior and operations challenge. Having
reports on filelevel utilization does not address the underlying reasons for overestimating
storage requirements – i.e., the 2:00 AM page problem, the uncertainty of demand, the slowness
of provisioning new space, the lack of financial incentive, or space request multiplication.
What filelevel utilization does provide is an early warning before a system runs out of space and
a feedback mechanism that helps calibrate future storage requests. But what is the cost of
globally tracking filelevel utilization?
Unfortunately, understanding file utilization requires the deployment of operating system agents,
along with the supporting infrastructure to manage and interrogate these agents. In the majority
of organizations with whom Onaro has worked, the ability to successfully deploy agents drops
off dramatically once the host count rises to about 7080. The complexity of maintaining
compatible agents, operating systems, and agent control applications – combined with the cross
functional demands of coordinating agent deployments makes this a task fit for Sisyphus. In a
typical scenario observed by Onaro, an organization with more than 10,000 ports had several
fulltime administrators just working on agent deployment. It took them over 18 months to
complete a full agent roll out, and at any given point in time a large percentage of agents were
not reporting. The result was a constant inability to accurately report on storage resource
consumption by application. Moreover, these agentbased systems required over 50 physical
servers to collect and analyze data – adding dramatically to the operating and capital cost
requirements.
Some systems attempt to get file utilization information without agents by logging into operating
servers. However, granting such access to a management system using the Unix secure shell or
the Windows management interface creates security risks that most organizations are not willing
to accept.
The bottom line is that there is no free lunch for attempting to capture filelevel utilization
information. Furthermore, replicated space makes the success of any of these approaches even
more unlikely.
Agentbased filelevel utilization programs do not understand the concept of replicated volumes
for a particular application. In scenarios where a source volume is replicated 2, 3, or even 10
times, allocating this cost back to the application owner is not easily done without significant
manual work.
One of the promises of measuring filelevel utilization is that space can be reclaimed. While
theoretically possible, the cost and downtime required to reallocate underutilized space makes
reclaimation difficult. In most cases, Onaro has found that although lowutilization environments
may be uncovered, the space is rarely reclaimed. Instead organizations can only hope that their
storage and application owners, armed with filelevel utilization information, will make better
decisions on the next application provisioning cycle.
Finally, some organizations that have uncovered significant underutilized allocations are hesitant
to attempt to improve utilization without an understanding of overall array loading. That is
because increasing utilization can decrease array and switch performance. Without a great deal
of confidence in the overall load balance across the storage infrastructure, storage teams risk
application brownouts by attempting to increase filelevel utilization.
A Better Approach
Given today’s prevailing trend toward overestimating space requirements and underutilizing
storage resources, VPs of infrastructure are faced with the dual challenge of reining in capital
expenditures while optimizing existing storage space. Experience proves that attempting to
monitor filelevel utilization in a large, complex SAN environment fails to address this
challenge. Onaro advocates a far more focused and efficient approach that enables organizations
to costeffectively increase the utilization of overall storage resources.
Start with Global Visibility
Most organizations that attempt to rein in their underutilized storage assets do not have a global,
macrolevel view of what assets are allocated to each business unit or application, let alone a
microlevel view of how much disk space is actually utilized. For environments where
replication is used, a macrolevel view of allocated resources is even more lacking.
SANscreen® Foundation combined with SANscreen Replication Assurance can provide the
macrolevel view of exactly which assets an application is using. Since SANscreen is service
aware, it understands all the resources required to deliver the necessary service to an application.
Since SANscreen is agentless and does not interrogate host applications, a typical 1000 to 5000
port datacenter can be up and operating with global visibility in about 8 hours.
Focus Your File Utilization Efforts on the Biggest Offenders
With SANscreen providing global visibility into the storage environment, storage teams can
focus their attention on the worst offenders for underutilized space. Storage teams should
determine the minimum amount of space they need to recover from an application to make their
efforts costeffective. Is it 1TB? 5TB? 500GB? By understanding the cost to recover storage
space and the minimum amount of space needed to deliver the desired ROI, storage teams can
then focus on determining filelevel utilization for applications that meet this criteria.
Let’s assume that recovering a 1TB block of space from an application makes economic sense.
1TB of space is about $30,000. Factoring in all the labor costs and application downtime, this
could be the right amount of space necessary to costjustify the effort. If the target space
utilization is 50% and the assumed utilization is about 20%, then any application that is
consuming more than 2TB should be investigated to determine file utilization.
To accomplish this, the storage team should work with the system administrators on a quarterly
or semiannual basis to identify the top candidates for reallocation. This focused approach will
yield the ROI results that a global “boiltheocean with agents” approach will not.
Finally, SANscreen Foundation has the change management capabilities to successfully, quickly,
and safely reallocate storage space.
Provide Basic Cost Allocation Reporting
Based on Onaro’s experience, most organizations do not have formalized chargeback or even
costallocation mechanisms in place. But the lack of a formalized process should not stop the
storage teams from reporting on exactly how much capital cost each application is consuming.
This amount should also include the cost of both the source and target arrays. Starting with this
basic costing information puts the organization on the right path to changing storage over
allocation behavior.
Shift Storage Between Tiers
By monitoring traffic by application over time, it is possible to determine which applications
have exceptionally low throughput requirements. Using SANscreen Application Insight
combined with the path awareness of SANscreen Foundation, storage teams can easily locate
candidates for migrating from Tier 1 to Tier 2 storage.
Load Balance Across the SAN and Arrays
With all the focus on reclaiming storage space, infrastructure teams often overlook other areas of
significant cost savings. Many times storage teams will only provision their switching
infrastructure to 50% of available ports out of fear of saturating the fabric. The same holds true
for arrays.
Utilizing SANscreen Application Insight, storage teams can balance traffic across arrays,
switches and fabrics to maximize the allocation of these hardware assets. Load balancing across
the SAN and arrays is the lowhanging fruit in capital cost reduction. But without historical,
applicationcentric traffic information, load balancing is exceptionally difficult.
Investigate Thin Provisioning Technology
Finally, much as virtualization software and hypervisors are the solutions to underutilized CPUs
for servers, new technologies that support thin provisioning of storage can help reduce
underutilized storage assets. This is technology not available from Onaro, but from vendors such
as 3PAR Data.
Conclusion
No one likes underutilized assets of any sort. But solving the problem of underutilized storage
assets is more involved than simply tracking filelevel utilization. Onaro advocates starting with
a global view of resource allocation and then focusing efforts on the greatest offenders. In
addition, taking a tiered approach to storage, implementing thin provisioning technology, and
balancing application load across all storage resources can also help optimize storage assets.
Using all of these techniques, storage teams can dramatically reduce capital costs without the
headaches associated with “boiltheocean” agentbased SRM systems.