100% found this document useful (1 vote)
102 views

Commissioning in The Time of CORONA E

Data center commissioning in times of Corona

Uploaded by

Joseph Poplinger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
102 views

Commissioning in The Time of CORONA E

Data center commissioning in times of Corona

Uploaded by

Joseph Poplinger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Commissioning in the Time of Corona

Shimon Katz, Cx Manager, HIT Cx

Introduction
This article will describe the challenges for commissioning a TIER IV critical facility against
the backdrop of the Corona pandemic and the difficulties it has added to the commissioning
process.

In the realm of Israeli construction, the commissioning field is not developed. Before
discussing the processes themselves, a glossary of terms will be provided in order to create a
common language:

Commissioning – “provide documented confirmation that building systems and assemblies


are designed, installed, and function in compliance with criteria set forth in the Project
Documents to satisfy the owner's operational needs.”1

Standard – a document detailing the requirements applicable to a product or service so


that it conforms to its purpose2.

Certification - a procedure of issuing a certificate certifying that a process or product


conforms to certain standard requirements, a procedure during which a professional,
authorized, independent party examines and certifies in writing that a certain process or a
service conforms to defined requirements.

TIER – a level out of multiple levels. Within the critical facilities sphere.

The UPTIME Institute and the Rating Levels


The UPTIME Institute began as a private research organization, founded in 19873. The Institute
established a network of 68 members, many being Fortune 100 companies4. In 1995, the
Institute published a White Paper that for the first time defined the four tiers that differentiated
between classes of datacenter facilities. The paper presented four possible configurations,
which according to the Institute enable different availability levels.

In 2009 the Institute was acquired by the 451 GROUPE research firm and became its operating
division. In 2019, the group was acquired by S&P Global Market Intelligence.

1 From definitions of the U.S. Building Commissioning Association https://www.bcxa.org/wp-


content/uploads/2019/09/NEW-CONSTRUCTION-BEST-PRACTICES-Final-Cover.pdf
2 From the website of the Standards Institution of Israel https://www.sii.org.il/he/israelistandards
3 https://en.wikipedia.org/wiki/451_Group
4 Turner, Seader, & Brill, Tier Classification Defines Site Infrastructure Performance, Uptime Institute White Paper, 1996.

1
In 2008, the Institute published the Data Center Site Infrastructure Tier Standard: Topology
document. Unlike the white papers published up to this time, this document was worded as a
standard and no longer contained diagrams and recommendations, only minimum
requirements and a definition of indicators.

It is important to emphasize that a private organization, such as the UPTIME Institute, cannot
publish an official standard for two key reasons:

• A standard is published by an official / public organization and requires a broad


stakeholder consensus.
• Standardization can be performed by certification institutes qualified for this purpose.

Nonetheless, the document's wide distribution and use have transformed it into a de facto
standard even if it does not meet the formal definitions.

Furthermore, it is important to emphasize that there is no legal obligation to use this standard.
According to the Institute, the leading purpose of the standard is to identify the anticipated
performance that will lead to the availability required by the owner. The owner elects to adopt
it based on various considerations of quality and risk reduction:

• Standardization and ensuring similarity of topologies with proven reliability.


• An understanding that the process (detailed below) adds a layer of quality assurance
to design and to performance.
• Reduction of insurance premium payments5.
• Recognition of the standard as an accepted international "mark of quality" in the realm
of computer facilities.

The standard document is updated from time to time. The last update dates to 2018.

The updated Tier Requirements Summary table for 2018

5 https://uptimeinstitute.com/lower-risk-and-insurance-premiums-with-tier-certification

2
The Institute's four tiers are differentiated by a number of key indicators: Redundancy, single
points of failure, delivery paths, fault tolerance, and continuous cooling.

In the 2018 update to the standard, several changes were made compared to previous
versions:

A requirement was added that the delivery path from the UPS system to the IT load would be
via two active paths simultaneously in TIER III as well.

Continuous cooling – the requirement was changed to ensure continuous cooling within the
period of time from the moment of power failure until the time of renewal of the delivery and
return to operation of the cooling system.

Certification processes
The UPTIME Institute has several tracks for certification:

• Certification of the facility6.


• Operational certification7.

The certification process for the topology is a multi-stage process:


• Description of the facility – in the first stage, a detailed description of the facility must
be sent to the Institute, which includes, among other things, the:
o Basis of the design
o Details of the load
o SOO8 for air conditioning, fuel, and electrical systems
o Calculations of fuel consumption and storage capacities
o Detailing of environmental conditions of the facility and derating calculations
to extreme heating/cooling loads (once every 20 years)
o Compartmentalization of paths (Tier IV)
o Continuous cooling calculations (Tier IV)
o Elimination of potential single points of failure for emergency power off
buttons, safety systems, etc.
o Single-line schematic diagrams and plans should be attached to the description
of the facility.
• In the second stage, the design is reviewed. The design is inspected methodically and
the Institute forwards a list of open questions and issues that need to be addressed
and, if necessary, have the design revised. Eventually, after several rounds and
revisions, the design is approved and serves as a basis for certification of the facility.
At the end of this process, the Institute issues a TCCD – Tier Certification of Design
Documents – approval, which certifies that the design meets the requirements for the tier.

6 Data Center site infrastructure tier standard: Topology


7 Data Center Site Infrastructure Tier Standard: Operational Sustainability
8 SOO – Sequence of [Facility] Operation

3
The approval of the design is valid for only two years and at the end of this period, insofar as
material changes are made in the standard, the design needs to be revised accordingly.

In the runup to the testing time, the Institute forwards a list of demonstrations. This list
comprises all the demonstrations they intend to perform in the building after construction is
completed. The demonstrations are divided into several key categories:

• Operation and load tests when switching from feed from the power grid to generators
in routine and fault scenarios.
• Demonstrations focused on generators in times of routine and fault, given that the
Institute views generators as the primary feed source for tiers III and IV.
• Fault demonstrations – simulation of faults and demonstrating that they have no effect
on the operation of the critical environment.
• Maintenance demonstration – simulation of component maintenance while they are
fully disconnected from all supply sources (electricity, water, fuel) and testing that
maintenance has no effect on the operation of the critical environment.
• Measurement and monitoring of temperatures, capacities, and fuel levels to ensure
that the sizes of the various reservoirs meet the requirements of the standard.
The customer must ensure that the facility is ready for the tests to be performed.

The demonstration list arrives with no defined order of performance. A recommended process
is to arrange the tests in a logical order that enables performance in a rational order that
corresponds to the structure of the facility without having to skip between systems and its
remote parts.

• At a prearranged time, an examination team arrives to observe the demonstration


process. The team's visit usually lasts about a week and during such time the customer
/ contractor is required to demonstrate the tests while concurrently the examiners
ensure the operation of the facility and monitor its stability. The team also examines
signage, isolation of paths, etc. The team comprises at least 2 examiners. The
examiners' work undergoes, prior to the forwarding of comments or approvals to the
customer, an inspection by control officials within the Institute. Moreover, examiners
vary between projects so that ordering parties / customers do not know the identities
of the officials they will work with and who will examine the various deliverables.

After compliance with the requirements is proven, the Institute certifies that the facility has
been constructed in accordance with the standard and issues a TCCF - Tier Certification of
Constructed Facilities - confirmation that the facility, as constructed, complies with the
requirements of the tier.

4
Example of TCCD approval Example of TCCF approval Example of physical TCCF
at the Company site at the Company site approval
for display at facility
Examples of the Institute's approval of facility certification

The Institute trains personnel in various courses, among them an ATD (accredited Tier
Designer) course intended for computer facility designers. The advantages and aims of this
course, according to the Institute, are to give the designers tools for professional development,
for a thorough understanding of the standard and its requirements and thereby to save them,
as well as their customers, time and money when they design and submit the facility for the
Institute's certification.

However, the Institute reiterates, in numerous documents, that its training sessions do not
grant the course graduates either the permission to certify a tier of the design or to certify a
tier of the facility as constructed or to certify conformance to the operational standard.
Furthermore, the UPTIME Institute, as a private entity, maintains that the TIER concept for
computer facilities is proprietary.

The advantages of the Institute's training sessions, from the UPTIME website9.

9 https://uptimeinstitute.com/education/course-details/accredited-tier-designer-atd-course

5
Certification of facilities in Israel
There are many diverse computer facilities in Israel. Without diminishing the reliability of the
facilities, only a few have undergone a formal process of certification by the UPTIME Institute's
standards. The first facility to have undergone full TIER III certification was the "Rotem" facility
(2015). Subsequently, the design of 3 additional facilities was approved:

• Ministry of Justice – TIER III (2021)


• SDS - Shonfeld Data Services Data Center – TIER IV )2020(
• A facility for an official body of the State of Israel – TIER IV (2017) – described below
in this article.

Challenges of commissioning a facility for TIER IV

Commissioning and certification processes are complex processes at their core, inasmuch as
there is a need to work according to a standard, orderly methodology consisting of many
components and a need to coordinate between them.

Worldwide, about 1700 facilities have been approved by the UPTIME Institute10, most to TIER
III. Worldwide, only about 50 facilities have received TCCF approval for TIER IV-rated
constructed facilities. On the company's website it can been seen that at a TIER IV rating there
are many facilities for which only the design has been approved (TCCD) but the owners, for
their own reasons, did not complete the process at the end of the construction and did not
receive TCCF approval.
Obtaining TIER IV-rated approval for a facility poses several difficult challenges for designers
and performers:

Isolation of paths

TIER IV requires compartmentalization11 and isolation of paths and fire areas. Even with a
meticulous and coordinated design, it is difficult to ensure that each space is fed via two
physically isolated paths. Beyond the design challenge, it should be ensured that the
construction also meets the definitions of the standard. Meticulous monitoring should be
performed so that all pipeline routes, electrical cabling, and communication-and-control
cabling indeed pass through physically isolated routes.

Continuous cooling

The continuous cooling criterion requires a cold water reservoir. The updated standard (2018)
is more lenient compared to previous versions, which demanded that the storage be for the
same period of time as the UPS battery backup time. Currently, the requirement is to ensure
continuous cooling for the period of time between the failure of the power supply and until the
full return to operation of the cooling system. This challenge requires meticulous design and
testing. Both the design and the SOO need to address a situation where the chillers are non-
operational, but the pumps continue to circulate water. It should be ensured that the water
circulation indeed passes through the cold water reservoirs and is not routed to other paths

10 https://uptimeinstitute.com/uptime-institute-awards/list
11 Compartmentalization

6
due to by-pass valves or mixing valves between the supply and return manifolds and that the
reservoir does indeed actually participate in supplying the cold water.

Building control and fault tolerance

The most significant disparity between the lower tiers and TIER IV is the fault tolerance
criterion. This is a complex requirement, which is not well defined in the standard, but is
examined thoroughly by the Institute. The essentials of the requirement are as follows:

• To detect the fault.


• To isolate the fault from the system.
• To raise an alert regarding the fault in the control system.
• All this in parallel with the continued operation of the facility.

This is a significant disparity compared to TIER III – at this tier, some operations may be
performed manually. For instance, valves in the cooling system may be operated manually,
even if nearly in all facilities these operations are actually performed by the control system.
Naturally, the issue is not examined within the framework of the standard certification.

At TIER IV, already at the design stage it should be ensured that each fault can be detected via
the control system. The complete dependence in this area on the control system mandates,
among other things, that this system also be free of single points of failure and that the
controllers, the power supply, and their communication network meet all the tier requirements
and that no single point of failure may deteriorate to a complete failure of the critical facility.
When designing the operation of the system, it is necessary to design how each possible fault
is to be contained, for instance:

• Detection of leaks and spillages and the closure of valves so that the fault area is
isolated from the rest of the system, with continued operation of the critical
environment. This monitoring is required in the air conditioning and fuel systems.
• Proof of tolerance capability of a leak in a pipeline and a demonstration of where the
water will flow and how it is to be drained or pumped.
• Proof of handling a fault or leak in systems (primarily air conditioning), which include
a connection both to side A and to side B, how a fault is detected, and how the unit is
disconnected both from side A and from side B without affecting the operation of both
legs, with continued functioning of the critical environment.

During the performance stage, it should be ensured, prior to testing, that every fault is indeed
monitored in the control system - when a fault occurs, an alert regarding the incident is
displayed for the operator and the SOO operates as required and effects containment of the
fault, as designed.
Moreover, it is emphasized that the entire testing process is performed where dummy loads
are installed in the facility with a capacity corresponding to the designed capacity of the facility
(in our case, about 300KW).

Communication and cyber issues

7
This is an evolving domain that at this stage has not been thoroughly tested within the
framework of the certification. Nonetheless, attention should be devoted to it when designing
the systems and the building control system. The communication system between the various
controllers should be resilient to both physical and logical faults. Physical resilience is tested
similarly to the demonstration of other routes. With respect to logical resilience, this does not
currently constitute part of the tier standard, but it should be addressed in the strictest manner,
given that a cyberattack is not currently defined as a fault but may compromise the network
and impact the entire facility, not just a single component.

Description of the facility:


The facility was constructed using a design-build method for an official body of the State of
Israel. As part of the ordering party's requirements from the contractor, he was required to
provide tier IV certification.

The facility is designed to ensure continuous operation for the organization in times of routine
and emergency. The facility consists of several key functions: A computer hall, dedicated
rooms, command-and-control halls, and a supporting energy apparatus. The primary energy
consumer at the facility is the computer hall. The facility has parts that are physically protected
against threats of war and terrorism.

The facility is required to comply with TIER IV. The facility was designed in full configuration
of 2N in the electrical and air conditioning systems. The cooling units in the data center and in
the communication room are based on CRAH and IN ROW units with dual coils and N+1
redundancy.

Commissioning and preparation for tier certification


There is no apparent connection between commissioning and certification processes and
certification can be performed regardless of commissioning. However, the certification
process is complex and there is no chance that any entity would be able to obtain tier
standard certification without appropriate preparation. This preparation is covered under the
commissioning.

In this project, the commissioning process was carried out by HIT Cx on behalf of
electromechanical systems contractor Electra M&E. As part of this process, preliminary tests
were performed for each piece of equipment individually (operation and testing – Level 3),
systems testing (Level 4), and in the end integrative system testing of the facility as a whole
(Level 5 – IST). Within the framework of each of these stages a preliminary demonstration
was performed of the demonstrations required by UPTIME Institute and this so as to ensure
that during the demonstrations of the facility to the Institute it would function as required
and prove it meets its design and operational objectives.

8
This process is also recommended by the UPTIME Institute. It can be seen in a slide from the
Institute's presentation that the process for obtaining TCCF approval is structurally
integrated in the commissioning process.

Commissioning in the time of Corona


The construction of the facility was completed at the beginning of the year and the contractor
prepared to invite the UPTIME Institute's representatives to obtain certification of the
constructed facility's compliance with the standard. The Corona outbreak and the actual
closure of Israel's borders to foreign arrivals mandated a different way of thinking with respect
to the ability to perform demonstrations under these constraints. The need to perform the
tests stemmed, among other things, from the owner's desire to populate the facility regardless
of Corona.
We consulted with the UPTIME Institute to explore options and it turned out that they had
performed a similar process for a TIER III facility in the United States. After reviewing our
request, they agreed to perform the certification in a split configuration: Its first phase would
be performed remotely while documenting the process by conference call, and during its
second phase, after the pandemic subsides, a team of the Institute would come for final
demonstrations and approval.

The list of demonstrations, which had been forwarded to us already about a year ago, was re-
examined and divided into 3 criteria (colors):

• "Blue" tests – must be performed at full load.


• "Red" tests – can be performed at full load.
• "Green" tests – can be performed without load during the future visit to the site.

9
Since the owner plans to populate the facility, it was clear that during the repeated tests there
would already be operational IT equipment installed there. This type of event does not
practically allow the installation of supplementary dummy loads up to the planned nominal
capacity.

The owner feared, and rightly so, that the performance of tests in an active facility could
jeopardize its operations. It was clarified to the owner that all the demonstrations aim to prove
that the facility is indeed single fault resilient and that none of the demonstrations should
cause it to shut down, certainly not after they had been performed at full load as part of the
remote demonstrations.

Therefore, it was agreed with the Institute that the remote tests would be performed while the
facility is fully loaded by means of load simulators, whereas the repeated tests would be
performed at the load as it would actually be at the time of the demonstrations.

This mandated that the remote tests comprise all the tests that must be performed at full load
("blue" and "red") and only the "green" tests could be performed without load during the future
visit.
The required scope of the tests included a total of roughly 110 demonstrations, of which 75
were for the remote demonstration stage and another 35 that could be deferred to the date of
the site visit.

As preparation for implementation of the remote demonstrations, weekly conference calls


were conducted in order to prepare for all aspects of the testing: Coordination of expectations,
defining an exact test protocol for each demonstration, video tests, and conducting a remote
tour of the site while presenting the assemblies and the systems.

The performance of remote demonstrations requires the development of a high level of trust
between the auditors and the demonstrating team. During the initial stages of the tests we
were required to conduct a comparison between the indicators, as displayed in the control
system, and measurements using independent measuring instruments that were
photographed online and displayed the actual measurements to the team that viewed the
demonstrations.

Another intrinsic difficulty was the difficulty of language. The examining team naturally speaks
English. The control system screens are predominantly in Hebrew and therefore there was a
need for translation and interpretation of the displays so that the audit team would be
persuaded of the system's reliability.
As I noted above, the audit team is a team of two. Due to the originality of the process and
since this is the first TIER IV facility in the world to undergo this process remotely, two
inspectors also joined the two examiners. In our case, the two examiners were from England
(a two-hour time difference) and the two inspectors were from the United States (a 7-8-hour
difference). These hour disparities made it difficult to coordinate the times and the tests
trickled into the hours of the evening and the night.

10
At the beginning of the tests, there was, as stated, a need to build trust and as time passed the
rate at which the tests were performed increased.

Naturally, when performing tests, faults emerge and arise at different levels of severity. Most
faults were in the control system display – incompatibility between the fault and its display on
the control screen or in the fault notification. All faults of this kind were redressed immediately,
demonstrated again, and the test was approved.

Furthermore, a routine fault occurred in the cooling unit of the communication room, which
shut it down. The room of course continued to operate (as planned). The fault was presented
to the examiners and it was decided to proceed with testing since the fault was not supposed
to affect the operability of this room.

After two days of an intensive testing process, the demonstration of all the tests that needed
to be performed while the facility operated at full load was completed and an Institute report
was received, which described the tests and confirmed that this stage had been passed
successfully.

Remote commissioning

After the end of the travel restrictions between countries around the world, the team of
inspectors will come to Israel, re-review the demonstrations based on the IT equipment that
will actually be installed, and conclusively approve the facility's tier certification.

Summary
The commissioning of facilities in accordance with the TIER standards of the UPTIME Institute
is a complex process that commences at the design stage and ends after the construction of
the facility.

11
In Israel, four facilities (two to TIER III and two to TIER IV) have had their designs approved to
date by the UPTIME Institute. Of these facilities, actual performance of TCCF was approved
for only one facility. This article describes the approval process for the second facility. Both
facilities which underwent an approval process for constructed facilities also underwent
orderly commissioning processes. Without the implementation of commissioning processes,
alongside the specification of the facility, its design, and intensive acceptance tests in the last
phase, the facility cannot be guaranteed to meet its objectives. Insofar as the facility also
requires TIER certification and approval by the UPTIME Institute, it is not practically possible
to receive certification without guidance via commissioning processes.

It is important to emphasize that even if the facility was designed by a team that has
undergone professional training at the UPTIME Institute, approval of the compliance of the
design and performance with the requirements of the standard can only be performed by
representatives of the Institute itself.

HIT Cx is solely dedicated to supply commissioning services to critical


facilities. HIT Cx focuses on critical facilities availability from
engineering point of view. The firm stuff has vast knowledge in data centers
and highly protected infrastructures.

The firm mission is to establish the commissioning methodology in the Israeli


building market and mainly in data centers and command and control
facilities, by using international standards and best practices.

Shimon Katz
Co-director at HIT Cx. Has over 30 years of seniority in the design, construction, and operation
of critical facilities and datacenters in the IDF and institutional bodies. Served as Chief
Engineer for Special Projects at Electra M&E. Holds a BA in Electrical Engineering and an MA in
Civil Engineering (Construction Management). QCxP certified by the University of Wisconsin.
ATD certified by the UPTIME Institute. PMP certified by the Project Management Institute.

All rights reserved ©

Tel: +972-4-85507411 www.hit-c.co.il [email protected]

12

You might also like