Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
TACC‐SECO
Final
Feasibility
Report
CM1001
Prepared by
Dan Stanzione
TACCSECO Feasibility Report
Executive
Summary
The SECO‐TACC project was designed to explore the feasibility of a shared datacenter
facility for Texas that could provide substantial cost and green benefits to the individual
participants as well to the state as a whole.
The Texas Advaned Computing Center (TACC) team has taken a comprehensive approach
to exploration of a shared datacenter, looking in equal parts at technology for both green
and shared datacenters, as well as the partnership and trust building required for a shared
datacenter plan.
The conclusions of this report are as follows:
• It is feasible for a shared datacenter facility to be constructed in Texas that
would involve the universities, state agencies, and corporate customers.
• A large scale shared facility would likely provide significant cost and energy
savings, as well as be an enabler for further economic development.
• There are a number of barriers that would prevent full adoption of this
facility in ways that would maximize power efficiency; nonetheless, the
adoption that would happen would still bring substantial benefits.
• Recent technological advances make a large scale shared facility possible at
dramatically higher power efficiencies than most existing datacenters.
• Barriers to shared‐system adoption are technical, legal, and psychological.
• Cloud computing trends in the private sector will continue to lower the
technical and psychological barriers over the next several years.
• Any shared datacenter facility must pay special attention to compliance with
HIPAA, FERPA, and other data protection standards.
• Rather than offering fractional servers, offering secure hosted services may
be a more successful approach.
• Shared datacenter and shared service models are already gaining traction
among in‐state universities, and TACC is facilitating some of this.
Page 2 of 31
Table
of
Contents
Executive Summary............................................................................................................................................ 2
Table of Contents................................................................................................................................................. 3
1.0 Introduction: Vision of a high efficiency, shared datacenter for Texas................................ 4
2.0 High Efficiency Datacenter Design ....................................................................................................... 5
2.1 Experimental partnership with Green Revolution Cooling.................................................. 6
3.0 Datacenter Partnerships.........................................................................................................................10
3.1 Exploration of partnerships with Switch Technologies.......................................................10
3.2 Proposals to potential datacenter partners ..............................................................................11
4.0 Cloud/Virtualization technology evaluation .................................................................................12
5.0 Practical issues in a shared, high efficiency datacenter............................................................13
6.0 Final Recommendations .........................................................................................................................16
Appendix A – New datacenters around the world..............................................................................19
Appendix B – Comparison of rack top to chilled water data centers (commissioned as
part of a datacenter design for UT from HMG Associates)..............................................................25
Appendix C: Cost estimates for Datacenter ...........................................................................................26
C.1 Site Plan ...................................................................................................................................................26
C.2 Preliminary Cost Estimates ..........................................................................................................26
C.3 Preliminary Cost Estimates (for the base work included above) ................................28
C.4 Projected Schedule...............................................................................................................................31
Page 3 of 31
1.0
Introduction:
Vision
of
a
high
efficiency,
shared
datacenter
for
Texas
The SECO‐TACC project was designed to explore the feasibility of a shared datacenter
facility for Texas that could provide substantial cost and green benefits to the individual
participants as well to the state as a whole.
The TACC team took a comprehensive approach to exploration of a shared datacenter,
looking in equal parts at technology for both green and shared datacenters, as well as the
partnership and trust building required for a shared datacenter plan.
The technology evaluation activities included evaluation of various cooling technologies,
including both devices for raising efficiency commercially available today, such as rack
door and rack top cooling units, and more experimental technologies, such as the
evaluation of a mineral‐oil immersion technology. Additional evaluation work was focused
on technologies for allowing hardware sharing, specifically virtualization technologies such
as Eucalyptus.
The second set of activities focused in building partnerships, both with potential datacenter
customers as well as potential providers, to advance a plan for producing a large scale, high
efficiency, multi‐customer datacenter in Texas.
This report summarizes the activities in both categories that took place during the course
of the project. Partnership building activities included:
• Exploration of partnerships with Switch.
• Proposals to potential datacenter partners
• Datacenter tours
• Analysis of prospects for shared datacenters.
Technology evaluation activities summarized include:
• High Efficiency Datacenter Design
• Experimental partnership with Green Revolution Cooling
• Cloud/Virtualization technology evaluation
• Survey of other significant datacenter projects in academia and industry
(appendix).
Page 4 of 31
2.0
High
Efficiency
Datacenter
Design
TACC has continued investigation into the design of more efficient datacenters through the
use of the best available off the shelf cooling technologies. The current TACC datacenter,
constructed 3 years ago, already incorporates technologies beyond those used in most
conventional datacenters to achieve higher efficiency at high density. While employing
racks using more than 30kw each, the TACC datacenter uses in‐row coolers (IRCs) from
APC to bring chilled water closer to the racks, and an enclosed hot‐aisle technique to
further enhance the effectiveness of the IRCs. Comparisons to more traditional datacenter
designs imply that employing the IRCs reduces total cooling power for the datacenter by
around 15%. The TACC datacenter is considered a model for modern datacenter efficiency,
and during the course of the project, TACC staff have made a number of public
presentations describing our approach to high density datacenters, and seeking industrial
partners in constructing a new one.
TACC continues to investigate alternative technologies commercially available to
determine if more efficient technologies could be used. In a recent exercise with an design
firm, the current commercial offerings in rack top and rack door chilling were compared
with new generation IRC technology, as well as investigations of alternate schemes like
importing outside air. The relatively high temperature and humidity in central Texas ruled
out the outside air options (though this approach may work well in west Texas, perhaps El
Paso, where less dehumidification would be required). The final design exercise came
down to a decision between rack door chilling units and IRCs. Ultimately, IRCs were
deemed to still be the best option for raising cooling efficiency. Appendix B contains the
engineer’s final analysis on these two options.
TACC continues to track the latest ASHRAE standards for operating conditions for new
computer equipment. Next generation computer hardware is expected to operate
effectively at significantly higher temperatures, and across a higher range of humidity
values. This new operating band may make the use of outside air a more attractive option.
The experience of TACC staff on current systems, however, shows that with current
hardware even modest increases in operating temperature generates a significant increase
in the rate of disk drive failure. TACC is evaluating the potential of running cluster nodes
without disk drives in each node to mitigate this weakness (the Ranger cluster compute
nodes use a solid state storage device, or SSD, instead of a spinning disk). The Ranger
experience has shown that solid state drives are a viable option at scale for reducing disk
drive temperature related failures, though the small size and limited write cycle current
SSDs were unpopular with the systems administration staff.
Page 5 of 31
2.1
Experimental
partnership
with
Green
Revolution
Cooling
In addition to evaluating off the shelf technology, TACC investigated experimental
technologies to boost datacenter efficiency that had not yet reached the commercial
marketplace (although the primary one was put on the market during the course of the
project).
The most promising of these investigations has led to the TACC partnership with Green
Revolution Cooling (GR). The GR team is designing a product that allows computing
equipment to operate while immersed in mineral oil. The high heat conductance of mineral
oil makes it dramatically simpler to remove heat from the equipment, dramatically
reducing the cooling infrastructure required. The PUE (power use efficiency) of a typical
current generation commercial datacenter is often around 1.4, meaning in addition to the
power for the systems, an additional 40% power is required to run cooling systems to
remove the heat (in many older datacenters, particularly small installations, the PUE can be
as high as 2.0). TACC finds the GR technology particularly intriguing as the potential exists
to reduce the PUE below 1.0. While some power is still required for cooling, this
innovative approach allows the system fans within each computer to be shut off or
removed. This reduces the actual power to the computing equipment by 10‐15%. The
total power required for cooling is only 3‐5% with the GR approach. This means GR cooled
systems may require less power than the computers running with zero cooling via
conventional means.
TACC has entered into a partnership with GR, which resulted in the first deployed
production prototype of this technology at TACC for evaluation in April of the project year.
Between May and the end of the project, TACC operated this system with a variety of
hardware configurations, and measured both power required to cool the system, and the
reliability of both the cooling infrastructure, and the computing equipment immersed in
the mineral oil. A photo of the equipment installation at TACC is shown in figure 1.
Not shown in the picture is a small evaporative cooling unit, which is attached via a pipe to
the heat exchanger to the left of the rack. The basic operation of the system is to immerse
the servers vertically in the rack, and create a slow flow of oil through the servers from
bottom to top. The oil is them pumped through the heat exchanger, where excess heat is
transferred to water. The water is then cooled evaporatively outside (or through any
other cooling source).
The energy savings come from several sources. The primary advantage derives from the
fact that mineral oil is 1200 times more effective as a heat conductor than air. As a result
of this, energy can be removed from the systems much more efficiently. In an air cooled
solution, the ambient air in the datacenter is typically cooled to 65‐80 degrees F in order to
keep the actual processor chip inside the computers running below 140 degrees F.
Because the mineral oil is such a superior conductor of heat, in this solution the ambient
temperature of the oil can be raised to 100‐105 degrees F. In an air cooled data center,
Page 6 of 31
water is normally chilled to 45 degrees in order to support the air conditioning units,
requiring significant energy. In the GR solution, the water to supply the heat exchanger
only needs to be cooled to around 90 degrees. Simply running the water through a small
evaporative unit and cooling it with outside air was sufficient to cool the oil throughout a
Texas summer, running 24 hours a day. A secondary source of power savings is that all
fans can be removed from the servers. The fans are used merely to increase the rate of
airflow across critical components, to improve heat conduction. The normal heat
conduction of mineral oil is sufficient to not require acceleration. The removal of fans
further reduces the base operating power of the servers by 5‐15% (depending on load), in
addition to the savings in external cooling power.
Figure 1: The Green Revolution Prototype Cooling Rack at TACC. The system is able to efficiently cool computing
equipment even when exposed to outdoor air.
With the GR solution, typical commodity computer equipment runs fully submersed in the
computer equipment with only a few simple modifications. For the TACC evaluations, we
used primarily Dell servers, with a few servers from other vendors. Typical server
modification for insertion in the oil took less than 10 minutes. When compared to other
liquid‐cooled solutions from other vendors, a substantial advantage of the GR approach is
the ability to use commodity‐built hardware from major vendors. For the bulk of the
evaluation at TACC, we ran a load of approximately 8KW in the mineral oil continuously for
Page 7 of 31
4 months. During this period, the measured total power for cooling (pumps, heat
exchanger, evaporative cooling unit) varied from 60 watts to 240 watts (depending on
pump rate and whether or not the cooling tower was active). The average power was
below 150 watts, or less than 2% of the total load.
The implications of this level of power savings are staggering. Consider the case of the
TACC data center that houses the Ranger supercomputer. This is a modern datacenter,
using in‐rack chilling technology and built for high efficiency less than 4 years ago. With a
PUE between 1.3 and 1.4, it is at least as good as any other modern large scale facility, and
vastly superior to the typical “data closet” used in most facilities in terms of efficiency.
Despite this efficiency, this datacenter uses approximately 1 million watts at all times for
cooling, or about 8.7 million kw/h per year. The costs of generating this cooling are about
$400,000 annually, and require the equivalent energy output of about 3,000 tons of coal.
A switch to the GR technology would reduce this consumption by 98%, and that is before
taking into account additional savings from removing the server fans!
When you consider that it is believed that 1‐3% of all US power consumption is already
spent on datacenters, and that between 1/3‐1/2 of that power is used for cooling, the
enormous potential for savings with this technology become clear. GR Cooling estimates
the savings for a “typical” rack (20KW) to be $78,000 over the course of a 10 year
datacenter life.
Given the success of the trial and the tremendous potential of this technology, we moved
our evaluation on to considering other practical considerations of deploying this
technology in a production‐scale datacenter.
The first concern was reliability. This can be separated into two problems, the impact of
reliability on the computing equipment itself from mineral oil immersion, and the
reliability of the cooling infrastructure. Given that in this pilot project we performed only
a 4 month evaluation with a single cabinet and a small number of servers, the reliability
results must be considered preliminary at best. However, the results so far are extremely
encouraging. We suffered no server failures during our evaluation, and prolonged
exposure to the oil seemed to have no ill effects. We believe that server reliability will
actually be slightly enhanced through this approach. The removal of the fans should
reduce failures. The fans comprise most of the moving parts in a server (disk drives are the
other), and therefore are a significant source of failures in normal operation. There
removal can only enhance reliability. Further, there is significant evidence in large scale
power systems that mineral oil immersion can improve component reliability (mineral oil
cooling is used in transformers and other components in the power grid). As a better
electrical insulator than air, mineral oil immersion should reduce micro‐arcs, small sparks
that normally corrode electrical connectors. This should provide another slight boost in
reliability.
During the course of our evaluation, we had only a single failure in the cooling
infrastructure. The root cause of the failure was faulty circuit wiring, with a shared ground
Page 8 of 31
between multiple breakers causing a current overload. While this particular failure is
unlikely to reoccur, this does point out the need for redundancy in the infrastructure.
While our prototype has a backup pump and backup heat exchanger, it has only one cooling
tower, and a shared power feed to the primary and redundant pumps. A production
solution would require separate power feeds to the backup systems, and redundancy in the
tower. This is not an onerous requirement. Our current datacenter design has 7
independent CRAC (Computer Room Air Conditioner) units and 3 independent chilling
plants, to provide sufficient redundancy to survive failures. The GR infrastructure would
simply need similar levels of redundancy.
Another concern in putting this technology into production is density, i.e., could a
datacenter layout built from this technology at scale support the same number of servers as
an air cooled datacenter in the same square footage count? We have put together a
comparative projection in conjunction with GR Cooling. The fundamental issue is that the
oil‐cooled rack is the equivalent capacity of a normal rack, but while a normal rack
provides 42U (42 “rack units”) of server space stacked vertically, the GR rack provides
those 42 units side‐by‐side. So, for a single cabinet, the GR unit provides a larger physical
footprint. However, for a full size data center layout, there are a number of space
advantages.
A traditional air cooled rack requires an aisle to be left at both the front and the back of the
rack, of suitable size both to support sufficient airflow and to allow for maintenance and
removal of servers. The typical size of this aisle is 4’. With the GR solution, servers are
removed from the top of the rack, so two cabinets can be placed back‐to‐back, removing the
need for an aisle. While the GR racks require additional floor space for the pump
infrastructure, they remove the need for CRAC units (which are larger) to be placed on the
floor. The hypothetical layout for a 24 rack GR‐based datacenter requires 38 square feet
per cabinet. The current Ranger datacenter supports 100 racks in approximately 4,000
square feet, or roughly the same amount. So, an equivalent number of rack units can be
supported in the same physical space. The hypothetical layout of our mineral oil cooled
datacenter is shown in figure 2.
Perhaps a more important measure of density is the total wattage of computing capacity
that can be supported in a given sized datacenter. Current racks at TACC draw 30KW.
The limit of air cooling schemes may fall at between 40‐60KW per rack. Though we do not
yet have computers at this density to test the hypothesis, we believe 100kw per rack is
possible with the GR solution. Such dense computing hardware should become available in
the next 4 years. So, while a GR‐style datacenter should support an equal number of racks
as an air‐cooled datacenter, the quantity of equipment per rack should be substantially
higher with the oil‐based solution.
Page 9 of 31
Figure 2: Hypothetical layout of 24 cabinet mineral oilcooled datacenter, with supporting infrastructure.
A final concern is the environmental and safety impacts of using mineral oil in the
datacenter at large scale. The mineral oil used in this solution is non‐toxic. In fact, grades
of mineral oil can be used within the racks that are considered fit for human consumption.
Handling and exposure provides no significant risks. This grade of mineral oil also poses
little fire hazard. While it will ignite in extreme heat, under normal conditions a lit match
can be dropped into the oil and will simply be extinguished.
TACC finds this cooling option to be particularly promising and exciting. The combination
of extreme energy savings, strong reliability, and the ability to simply adapt commodity
servers makes this a very attractive technology for a future green datacenter. TACC
continues to evaluate this technology and explore additional funding in conjunction with
GR to support this concept.
3.0
Datacenter
Partnerships
3.1
Exploration
of
partnerships
with
Switch
Technologies
Switch Technologies operates the SuperNAP datacenter in Las Vegas, a 150MW, 400,000
square foot datacenter which is among the largest and most efficient hosting centers in the
world. Switch is seeking to build additional facilities of approximately 1 million square feet
at several additional strategic sites in the United States. TACC has forged a coalition with
UT‐Austin, UT‐System, and Austin community leaders to work with Switch to locate a new
large scale datacenter in Central Texas. Representatives of Switch have made several trips
to the area to meet with the Chamber of Commerce, local power companies, and to visit
potential sites. TACC staff have also arranged visits for university officials to Switch’s Las
Vegas facility.
Page 10 of 31
As the ultimate goal of this project is to determine the feasibility of constructing a large
scale, shared datacenter in Texas, the relationship with Switch could be a critical
component. The ultimate goal of this investigation is to forge a partnership with Switch
whereby they would invest in the construction and operation of this facility, and the
university and other state entities would enter into a long term Memorandum of
Understanding (MOU). The addition of corporate customers Switch could attract (some of
which TACC could facilitate) would defray the cost to the state for the use of this facility.
Negotiations for an MOU with Switch and a possible site in Austin are ongoing.
The TACC team also continues to investigate other potential large scale datacenter options
as well. As part of this project, we also visited the Citibank datacenter in Georgetown, as
well as doing site explorations with the Domain, the Met Center, and Montopolis to explore
power and cooling options.
3.2
Proposals
to
potential
datacenter
partners
A non‐technical issue which will be key to the ultimate success of any Texas datacenter
proposal will be the engagement of additional occupants of the facility beyond TACC. The
TACC team has engaged in a number of activities to gauge the interest level of other
potential customers in such a shared facility. Discussions thus far include:
• Presentations to TACC’s Science and Technology Affiliates for Research
(STAR) industrial partners, including Chevron BP, and Shell on the concept of
a high efficiency shared facility.
• Discussion with other Texas state institutions about the use of both a shared
facility and shared systems through the High Performance Computing Across
Texas (HiPCAT) consortium.
• Meetings with UT‐System about partnerships with the medical institutions to
share in a facility.
As a result of these discussions, several concrete partnerships have emerged that can be
leveraged in a future datacenter investment. One partnership was formed between TACC
and both Texas A&M and Texas Tech universities. Both Tech and A&M agreed to invest
$500k in the next TACC supercomputer, recognizing the increased capabilities and
economies of scale of investing in a large shared system. While this partnership focuses on
systems rather than datacenter space, it represents both an unprecedented level of
collaboration, and a foundation for future collaborations on shared facilities as well as
shared systems.
A second partnership of note is a relationship to back up UT System data in the TACC
datacenter. Part of this relationship will be the recognition of UT‐Austin and TACC as a UT‐
System “shared datacenter”; while not yet a statewide datacenter, and not yet a focus on
green, this partnership will allow all 15 UT institutions to purchase datacenter services
through TACC, a foundation for a larger statewide shared datacenter agreement.
Page 11 of 31
In a related development, TACC is developing a proposal with the 15 institutions to develop
shared, centralized data storage and computation systems. This initiative would stop short
of a shared datacenter (the partner institutions would receive allocations on a time shared
computing cluster, and share space on a large disk system). While not yet funded, this
proposal is another step in the evolution to a vision of a large scale, shared facility, rather
than replicated facilities across the state.
In meetings with industrial partners to gauge interest in buying into a shared datacenter
facility (primarily the energy companies), we have found interest is high, though some
concerns remain. For instance, one large oil company finds our pricing on a shared facility
involving comparable to their internal costs, and a recent large internal investment would
preclude participation for the next 4‐5 years. Nevertheless, we feel there would be a strong
possibility of attracting at least a few corporate customers to be tenants should a statewide
datacenter be constructed.
4.0
Cloud/Virtualization
technology
evaluation
Technologies for cooling are only one part of the equation for building a high efficiency
datacenter. The other technologies to boost datacenter efficiencies are those which allow
multiple users to actually share hardware, reducing the total number of computers that
must be deployed. While there is currently much resistance to this idea, there is significant
precedent; in the early days of mainframe computing, all computers were timeshared, and
systems were so expensive that companies routinely leased time on shared systems. The
TACC supercomputers have continued to share this philosophy, with the current systems
supporting more than 400 projects and maintaining more than 90% utilization; much
higher utilization than 400 small clusters would maintain, using dramatically less total
hardware and power.
While timesharing has been around since the beginning of computing, the recent rise of
virtualization technologies have enabled a new class of system sharing. While
virtualization itself is an old idea, pioneered by IBM nearly 40 years ago, only recently has
virtualization become practical on commodity, low‐cost servers. Simply put, virtualization
provides a user with the illusion of having their own server; however, this server is
“virtual”, it exists inside of a real server, but may share it with other virtual machines, or it
may move from one physical server to another. In the last few years, virtual machine (VM)
performance has improved to the point where for many applications, running in a VM is
nearly indistinguishable from running on a physical system.
To date, the project team has constructed a cluster of systems using the Xen VM software
that are now used for a variety of information technology tasks at TACC. Performance
characterization of these systems has shown thus far that processor intensive tasks suffer
little performance penalty, though Input/Output intensive applications may still see 20‐
30% degradation from VMs. In general, VMs have proven robust enough for normal use.
Indeed, virtualization is now the basis for a large number of commercial cloud and hosting
options, proving the viability of this technology at scale.
Page 12 of 31
Most recently, the project team has set up a new virtual cluster using the Eucalyptus open
source software package. Eucalyptus allows the creation of a “private cloud”, e.g., it
manages a set of physical resources and dynamically schedules sets of virtual machines
from a queue of requests. The Eucalyptus software can run VM images compatible with
Amazon’s Elastic Compute Cloud (ECC). The current evaluation of Eucalytpus is to analyze
the security model, i.e. to evaluate if the system will provide sufficient data protection to
allow multiple customers to share virtual machines in the same physical systems without
risk of exposing sensitive data. While VMs are now robust enough for the workloads of
many clients in a shared datacenter, the primary concern is protection of one client’s data
in VMs on shared systems from other users. This concern is discussed more fully in the
next section.
Our evaluation included Eucalyptus and the XenServer software. A third viable, and
perhaps the strongest offering in this space, is VMWare server. Budgetary considerations
precluded a thorough evaluation of VMWare. The evaluations of Eucalyptus and
XenServer included hands‐on experience with small clusters running multiple VMs, as well
as interviews with the operators of the largest current installation of Eucalyptus, the
Nimbus system at Argonne National Laboratories.
The evaluation determined that Eucalyptus is useful for research and academic projects,
but is not yet mature enough for large production installations. While smaller sets of
systems work well, at large scale, there are still many bugs at “edge conditions” as the set of
VMS exhaust the resources of the physical servers. XenServer seemed more bullet proof
and stable, and offered more consistent and repeatable performance. Anecdotal evidence
implies VMWare ESX is perhaps the most robust, but at a very high cost.
The VM software evaluation proved to us that virtualization at this point is a robust enough
solution to provide server sharing for a set of systems where there is trust among users (i.e.
within many groups in a large company). However, security concerns with all of these
products make us hesitant to recommend this solution to enable the sharing of servers
between distinct customers who have a competitive relationship or lack of explicit trust.
See the further discussion of this issue below in section 5.
5.0
Practical
issues
in
a
shared,
high
efficiency
datacenter
Technically, there are very few barriers remaining to building a large scale, multi‐customer,
shared server datacenter. In fact, a number exist. However, there are still a number of
practical barriers to constructing a large datacenter that effectively implements sharing
within a rack. Most of these issues stem as much from perception as technical reality.
The most fundamental issue is the issue of data security. The perception exists that any
data stored with a remote provider is inherently less secure than data stored at the owner’s
Page 13 of 31
site. This assertion is almost certainly untrue. Maintaining data security is an ever more
complex task. The list of compromises of individual datasets by companies or government
agencies is simply too long to even attempt to elaborate. In this climate of growing
complexity, there is a distinct advantage to having a large investment in security
professionals. While organizations still perceive that keeping data “in‐house” is the most
secure method for storing data, the simple fact is that large, IT focused organizations are
much more likely to be able to sustain the investment required to truly make data systems
secure. The rise of “cloud computing” providers are slowly changing this culture.
Companies like Amazon.com and Google are capable of investing tens of millions quarterly
into security for shared systems, a larger investment than most universities, government
agencies, or small to mid size companies can make in IT total. The success of the cloud
model is likely to slowly start changing this mindset towards “remote” becoming
synonymous with “secure”.
From a technical perspective, virtually all organizations have mastered the art of
distributing secure access to their systems. The widespread use of VPN (Virtual Private
Network) technology has made it routine for companies to offer secure access across
distributed worksites, or to employees outside the office or at home. The move to a remote
datacenter is simply making the psychological change that the one of the sites on the VPN is
no longer in a company owned space. The technical challenges for a remote datacenter or
largely solved, at least for the case of dedicated servers living in a remote space. For
shared, multi‐client servers, more technical barriers remain, and the use of secure
virtualization software is being investigated as part of this study. While the technology
exists to isolate users from each other effectively through shared systems, the weakness in
current virtualization software is that the administrators of the shared systems can still
access all of the machine images on the shared system. So, either a framework of trust
must be developed with providers, or current offerings must be limited to a “shared space”
but not a “shared system” approach.
This is perhaps not as large an issue as it would at first appear. First of all, trust can likely
be established between a large enough set of groups to allow a relatively large set of
servers to be shared (for instance, all state agencies sharing servers operated through a
state‐run virtual pool). Second, most organizations require many servers. The benefits of
sharing within the group provide almost all of the available advantage through
virtualization; sharing with another organization provides at best a marginal benefit.
Consider for instance this example. Suppose three departments choose to use
virtualization to consolidate their servers. Also suppose that each physical server can
accommodate up to 8 virtual machines (in practice this is slightly more dynamic, but the
principle holds). Suppose department A has 30 servers, department B 35, and department
C 20. Let’s consider a “server” to be a physical machine consuming 500 Watts at nominal
load. Figure 3 below lists the number of physical servers required by each department, and
total, in each of three scenarios: Physical servers, virtualization with no shared servers
between departments, and full virtualization with shared systems.
Page 14 of 31
Scenario Physical Virtualization Virtualization,
Servers servers shared
between
departments
Dept. A 30 4 3.75
Dept. B 35 5 4.325
Dept. C 20 3 2.5
Total 85 (42.5KW) 12 (6KW) 11 (5.5KW)
Figure 3: Comparison of server count and power consumption for 3 departments using physical servers, 8 way
virtualization, and virtualization with sharing between departments.
In this scenario, simply employing virtualization within a department reduces total power
from 42.5KW to 6KW, a reduction of total power by 86%. The additional step of sharing
servers between departments in this case saves only an additional 0.5KW, less than 2% of
the original power (though a more significant 8% of the virtualized power). A datacenter
could be made that was substantially “green” simply by providing virtual pools between
groups that can establish a sufficient level of administrative trust.
While the technical challenges of a shared datacenter are largely solved, a more complex
set of problems revolve around the legal and regulatory framework regarding the retention
and privacy of data. If an organization stores, for instance, data about medical patients, the
datacenter and systems housing it must be in compliance with the federal HIPAA laws.
Data such as customer names and credit card numbers has a less rigorous legal standard,
but still carries significant civil liability risk if disclosed. The legal and regulatory
frameworks surrounding privacy issues add substantial complications for the operation
both of shared datacenters and shared systems. Many organizations currently interpret
HIPAA, for instance, to essentially require all HIPAA systems to reside in an isolated,
physically secure datacenter.
However, this typically seems to involve an overly conservative interpretation of the
regulation. With the increased emphasis on electronic medical records, cloud services
providers such as DiCom Grid (http://www.dicomcourier.com/about/) have begun to
demonstrate HIPAA compliant services delivered over the internet from a shared, secure
datacenter. DiCom’s ImageCare platform allows small health care facilities to store images
remotely at DiCom’s datacenter while remaining compliant. A much more rigorous
software approach to encryption and data transfer is required, however, shared facilities
are still possible. For large hosting providers, typical solutions involve building secure
“zones” within a shared datacenter, where separate network facilities can be provisioned,
and separate access logs detailing who gains physical access to that zone can be maintained
and audited.
To summarize, security and comfort with the idea of “outsourced” systems remain
significant barriers to a multi‐use datacenter, but the rise of cloud computing is changing
the perception of these issues. For a datacenter with shared physical space, there is
primarily a social problem, secondarily a legal/compliance problem that is probably
Page 15 of 31
manageable, and very few technology problems remains. For a facility with shared
systems, there are more significant but likely solvable technology problems, but perhaps
insurmountable legal and regulatory problems. Fortunately, large cloud providers with
significant political influence, including Google, Microsoft, and Amazon may be able to
affect the relevant regulations and laws over time.
6.0
Final
Recommendations
The primary conclusions of this project is it is clear that (1) there are significant economies
of scale in large datacenters, (2) that datacenters can be built that are dramatically more
efficient than typical datacenters that exist today, and (3) that it is practical at the very least
to share the physical facilities between many customers. At this point, it is less clear that
sharing at the individual server level is practical – while technically feasible, it is unclear if
most customers at this point would be willing to accept the perceived security risk. We
believe firmly that the psychological, regulatory, and legal barriers that have in the past
inhibited large scale shared facilities are already surmountable, and are becoming
progressively lower as market forces reduce them. It is also clear that the gap in
efficiency between a large scale, professionally run datacenter, and a small, in‐house
datacenter has grown substantially, and continues to grow. Consider that a cutting edge
green datacenter may now involve a many megawatt facility with redundant high voltage
lines, a mineral oil cooled infrastructure on an industrial floor, a suite of virtualization
services to be monitored and maintained, a set of information security experts, a set of
compliance experts capable of generating secure network segments and tunnels for routing
sensitive data subject to regulatory concerns, and 24 hours operations. How many
individual organizations have it within the scope of their budget to achieve this level of
efficiency, and provide the many kinds of expertise required for reliable, secure
operations? How many organizations within the state should replicate this expensive
functionality?
This study has left little doubt that if constructed, a statewide datacenter could be made
substantially more efficient than the many small existing datacenters in the state. It would
be adopted across the many universities of the state, by industrial partners, and hopefully
by state agencies. Collaborations required to do this are forming in an ad hoc way even
without a central facility, as shown by TACC’s new partnerships with UT System, Texas
A&M, and Texas Tech.
The fundamental barrier at this point is initial capital; which state entity will bear the
budgetary load of initial construction of such a facility? TACC is eager to work with other
entities throughout the state and the legislature to see such a project brought to fruition.
As part of this project, TACC commissioned a cost estimate and design for construction of a
complete datacenter at UT‐Austin’s Pickle Research Campus. The summary of costs is
presented in Appendix C. A finding of this study was that the campus would require
substantial infrastructure upgrades to support a datacenter at the scale proposed, perhaps
driving up construction costs by as much as $24M. An alternate location with more
Page 16 of 31
available electrical power infrastructure would substantially reduce construction costs.
TACC is pursuing possible alternate sites.
Page 17 of 31
Page 18 of 31
Appendix
A
–
New
datacenters
around
the
world
Data Centers opened within the last 12 months or currently being built with a focus on “green”
technologies.
Highlights:
• $32.7 million funded by Academic Facilities Bonds ($18.3 million), Infrastructure
Reserves ($8.4 million) and Capital Projects/Land Acquisition Reserves ($6 million).
• Single story bunker style design
• A ten- to twelve-foot-high berm around most of the building. This berm, planted with
native grasses and drought-resistant plants, improves insulation and reduces heat gain on
exterior walls and eliminates the need for a potable water irrigation system.
• 2 - 1.5MW generators with a rough in for a third expansion unit
• 2200 (1100 now and 1100 future) Tons of cooling via 4 chillers (3 now and 1future) and
4 cooling towers (2 now and 2 future) rated to withstand 140MPHwinds
• 2 total flywheels rated at 750kva/675kw UPS with flywheel energy storage providing
appr. 20 sec of ride through at full load with provisions for future third unit. UPS systems
are paralleled in an N+1 arrangement.
• 1 UPS rated at 500kva with battery energy storage providing 8 minutes of ride through at
full load
• 2x1800A @ -48V DC power system for MDF room
• 1500 sqft of raised floor for MDF room
• 3 Primary machine room pods
Complete list specifications: http://it.iu.edu/datacenter/docs/IU_Data_Center_Facts.pdf
Page 19 of 31
Highlights:
• IBM provided more than $5 million in equipment, design services and support to the
GDC project, including supplying the power generation equipment, IBM BladeCenter,
IBM Power 575 and IBM z10 servers, and a DS8300 storage device.
• The New York State Energy Research and Development Authority (NYSERDA)
contributed $2 million to the project.
• Constructed in accordance with LEED “Green Building” Principles
• The SU GDC features an on-site electrical tri-generation system that uses natural gas-
fueled microturbines to generate all the electricity for the center and cooling for the
computer servers. The center will be able to operate completely off-grid.
• IBM and SU created a liquid cooling system that uses double-effect absorption chillers to
convert the exhaust heat from the microturbines into chilled water to cool the data
center’s servers and the cooling needs of an adjacent building. Server racks incorporate
“cooling doors” that use chilled water to remove heat from each rack more efficiently
than conventional room-cooling methods. Sensors will monitor server temperatures and
usage to tailor the amount of cooling delivered to each server—further improving
efficiency.
• The GDC project also incorporates a direct current (DC) power distribution system.
Cost: unknown
Data Center Space: 20,000 sq ft
Total Facility Size: 88,000 sq ft
Green Features: Yes
Website: http://www.ncsa.illinois.edu/AboutUs/Facilities/pcf.html
Highlights:
• The facility is scheduled to be operational June 1, 2010
• Data center will house the Blue Waters sustained-petaflop supercomputer and other
computing, networking, and data systems
• NPCF will achieve at least LEED Gold certification, a benchmark for the design,
construction, and operation of green buildings.
• NPCF's forecasted power usage effectiveness (PUE) rating is an impressive 1.1 to 1.2,
while a typical data center rating is 1.4. PUE is determined by dividing the amount of
power entering a data center by the power used to run the computer infrastructure within
it, so efficiency is greater as the quotient decreases toward 1.
• The Blue Waters system is completely water-cooled, which (according to data from IBM)
reduces energy requirements about 40 percent compared to air cooling.
• Three on-site cooling towers will provide water chilled by Mother Nature about 70
percent of the year.
Page 20 of 31
• Power conversion losses will be reduced by running 480 volt AC power to compute
systems.
• The facility will operate continually at the high end of the American Society of Heating,
Refrigerating and Air-Conditioning Engineers standards, meaning the data center will not
be overcooled. Equipment must be able to operate with a 65F inlet water temperature and
a 78F inlet air temperature.
Interview with IBM Fellow Ed Seminaro, chief architect for Power HPC servers at IBM, about
this synergy as well as some of the unique aspects of the Blue Waters project:
http://www.ncsa.illinois.edu/News/Stories/Seminaro/
The new facility would have approximately 40,000 square feet and would comprise three
functional components: a computing area; an electrical and mechanical support area; and a small
office/support area. The two-story building would be about 50 feet high.
If approvals and construction proceed as planned, the facility would be operational in 2011 and
would be staffed by three people. It is expected to support the University's program needs
through at least 2017. The facility is sited to allow for future expansion; a second phase of
construction potentially could double the square footage.
http://www.princeton.edu/main/news/archive/S26/39/58I51/index.xml?section=topstories
Page 21 of 31
2. Corporate Data Centers
Recently opened or announced “green” corporate data centers.
Highlights:
• GHD is a 10,000 sq. ft. data center that is powered entirely through renewable wind
energy.
• GHD operates its facility at approximately 40-60% greater energy efficiency than the
average data center.
• The data center leverages the following attributes to gain the efficiencies:
o Air-Side Economizers – Free cooling from Cheyenne's average annual
temperatures of 45.6 degrees.
o Hot-Isle Heat Containment – Maximizing cooling efficiency by enclosing the hot-
isle and capturing or exhausting heat as our state of the art control system
determines.
o Modular Scalable Data Center – matching maximum efficiencies without over
building and waste.
o Efficient Floor Layout and Design – aligning hot aisle/cold aisles and redefining
the cage space concept.
o Highly Efficient IT Equipment – spin down disk technology and servers with the
highest power to performance ratios.
o Virtualization – cloud computing environment that reduces energy waste from
idle servers and IT equipment.
• Power – GHD has built an N+1 electrical power infrastructure that delivers power to its
customers in a true ‘A’ and ‘B’ power configuration. The facility has the capability of
providing up to 12.5kW of power to each data cabinet. The facility receives its power
from the local power company via a redundantly switched substation. Our internal power
infrastructure includes ATS, Generator and UPS protection to each rack.
Page 22 of 31
floor. Inside this plenum, air is prepared for introduction into the IT equipment area.
When the outside air entering the facility is colder than needed, it is mixed with the warm
air generated by the IT equipment, which can be re-circulated from the upper floor into
the lower chamber.
• Filtering and Airflow: HP uses bag filters to filter the outside air before it enters the
equipment area. Once the air is filtered, it moves into the sub-floor plenum (which is
pressurized, just like a smaller raised floor) and flows upward through slotted vents
directly into the cold aisles of the data center, which are fully-enclosed by a cold-aisle
containment system. Capping the cold aisles in a “cool cubes” design allows the system
to operate with lower airflow rate than typical raised floors in an open hot aisle/cold aisle
configuration.
• Racks and Containment: HP uses white cabinets to house the servers at its Wynyard data
center, a design choice that can save energy, since the white surfaces reflect more light.
This helps illuminate the serve room, allowing HP to use less intense lighting. Another
energy-saving measure is the temperature in the contained cold aisle, which is maintained
at 24 degrees C (75.2F).
• Cabling and Power: The unique design of the HP Wynyard facility, with its large first-
floor plenum for cool air, also guides decisions regarding the placement of network
cabling, which is housed above the IT equipment. The UPS area is located in the rear of
the lower level of the data center, following the recent trend of segregating power and
mechanical equipment in galleries apart from the IT equipment. Heat from the UPS units
is evacuated through an overhead plenum and then vented to the outside of the building
along with waste heat from the server cabinets.
• At an average of 9 pence (11.7¢) per kWH, this design will save Wynyard approximately
£1m ($1.4m) per hall, which will deliver HP and its clients energy efficient computing
space with a carbon footprint of less than half of many of its competitors in the market.
http://www.communities.hp.com/online/blogs/nextbigthingeds/archive/2010/02/12/first-wind-
cooled-data-center.aspx?jumpid=reg_R1002_USEN
2.3 Facebook
Prineville, OR
Highlights:
• Announced January 21, 2010 - First company built data center; construction phase to last
12months.
• The 147,000 square foot data center will be designed to LEED Gold standards and is
expected to have a Power Usage Effectiveness (PUE) rating of 1.15.
• The data center will use evaporative cooling instead of a chiller system, continuing a
trend towards chiller-less data center and water conservation.
Page 23 of 31
• The facility will also re-use excess heat expelled by servers, which will help heat office
space in the building, a strategy also being implemented by Telehouse and IBM.
• UPS design - The new design foregoes traditional uninterruptible power supply (UPS)
and power distribution units (PDUs) and adds a 12 volt battery to each server power
supply. This approach was pioneered by Google.
• Facebook Green Data Center Powered By Coal? – Recent reports suggest the announced
data center was not as green as previously thought. The company it will get its electricity
from is primarily powered by coal and not hydro power.
o Facebook responds to the criticism:
http://www.datacenterknowledge.com/archives/2010/02/17/facebook-responds-
on-coal-power-in-data-center/
Page 24 of 31
Appendix
B
–
Comparison
of
rack
top
to
chilled
water
data
centers
(commissioned
as
part
of
a
datacenter
design
for
UT
from
HMG
Associates).
• Requires heat exchanger loop and higher chilled water supply (50°F-55°F
Supply).
• Does not control high temperatures in the cold-isle – requires supplemental
cooling.
• If chilled doors were utilized, the additional cost to the project would be about
$1,800,000 for the equipment only.
B. In Rack Coolers (IRC) placed in-row with the servers in a cold-isle contained hot aisle
configuration. This option is more efficient than the traditional CRACs and similar to
Ranger. We have asked APC to look at adding some additional IRCs in order to lower
the noise level, and evaluate if it is a cost effective option. If we used the IRCs we
would use traditional CRAC units for cooling the envelope and miscellaneous equipment
heat not in the in-row rack line up. The CRACs would also be able to cool
approximately 350 kW of additional miscellaneous equipment loads as currently
configured.
Recommendation: Based upon the fact the chilled water door capacity is 25% less than
actually needed, the cost benefits are dramatically reduced. Based upon this my
recommendation is to utilize the IRC option noted above and as discussed in our OPR
meetings of January 20 and January 21, 2010.
Page 25 of 31
Appendix
C:
Cost
estimates
for
Datacenter
The following is a detailed cost plan for a 5MW datacenter, constructed at the TACC facility,
prepared by an independent design firm. This design assumes a conventional raised floor
facility, with in-row cooler technology.
C.1 Site Plan
C.2
Preliminary
Cost
Estimates
Base Work, cost for the machine room expansion and the necessary satellite chilled
water central plant (sized only to support the expansion project)
Base Costs $32,000,000
Additional Costs, proposed by Facilities Services (their Option 2):
• Install 2 – 2,500 ton water cooled modular chillers at the site (which increases the
capacity from 3,300 tons proposed above to 5,000 tons – Adds $2,000,000
• Electrical upgrade from sub‐station to CP1 then TACC ‐ Adds $10,000,000
• Austin Energy Transformer upgrade from 13 MW to 30 MW – Adds $12,000,000
• Existing air cooled chillers would be shutdown and left in place for backup
Page 26 of 31
Additional Costs $24,000,000
TACC Machine Room Expansion, with the Additional Costs
$56,000,000*
*Note: the estimate is based on a construction start in 2010. A later start will need to
add escalation. The Electrical upgrade costs have turned out to be required, so they are
not optional. The total does not include the computer equipment and networking inside
the machine room expansion, for which TACC is going to seek grant funding.
Page 27 of 31
C.3
Preliminary
Cost
Estimates
(for
the
base
work
included
above)
Page 28 of 31
Page 29 of 31
Page 30 of 31
C.4
Projected
Schedule
Austin energy projects a 24 month lead time for transformer upgrades at the Pickle
Research Campus, so any construction at this site could not be completed before
2013 if funding were immediately available.
The transformer upgrade will be the dominant schedule factor in construction, so
any project at Pickle would require at least 2 years from approval of funding.
The outcome of this site plan and cost analysis indicates that the Pickle campus may
be a poor choice for a new data center site. Costs could be substantially reduced by
moving to a site with available electrical infrastructure (such as Montopolis), or by
moving out of UT facilities entirely.
With the electrical upgrades removed, a 4 month time is estimated for completed
design, followed by a 9 month period for facility construction.
Page 31 of 31