Update on last Friday’s outage

Posted by Joey Day

Here’s the final report from the data center regarding last Friday’s outage:

We believe that a little after 8:30 P.M. MST on Friday, 1/25/2008 the janitor either accidentally or intentionally triggered the Emergency Power Off (EPO) system for UPS1. As a result of the EPO system activation, all customers equipment connected to UPS1 lost power. The power loss was to both PDU1 and PDU2 and all panels (A through D) on both PDUs. The Janitor did not report the power EPO system activation, and left the site immediately after the EPO system was triggered.

Part of the security system’s communication subsystem was disabled during the EPO event. As a result, customers and engineers arriving at the site were unable to gain access to the facility until an engineer arrived with a physical key. At that time, the EPO system was deactivated. The UPS system did not come back online correctly, therefore, the UPS system was bypassed.

Tier Four engineers then helped customers who lost power to get their systems back online.

The following day, Saturday, 1/26/2008, Tier Four engineers met with the UPS vendor to determine why UPS1 did not come back online after the EPO. At that time, the UPS vendor discovered that the three input power phases were not in correct alignment with the expected phase order in order for the inverter to function properly. Typically, the power phases are ordered A,B,C. The phases were instead ordered B,C,A. The phases were reordered last year when the building’s main power feed was upgraded by UP&L. Most equipment was reconnected A,B,C, however, the contractor overlooked UPS1 at that time. Because the phases are in the correct order, all equipment connected to them will function properly because it turns the right direction. However, the UPS system will not start up from shutdown state if it senses that the input phases are not ordered according to its expectations. When the power upgrade was completed last year, the UPS was never shut down. Therefore, this issue was not discovered until it was shutdown with the EPO, then when it tried to startup, it sensed the phase shift and refused to do so.

As a result of this discovery, it became aparent that the UPS could not be brought online without shifting the input phases to A,B,C. Tier Four engineers, the UPS vendor, and the electrical contractor worked through the day to attempt to find a way to bring the UPS online without taking the load offline. However, late Saturday night, the decision was made that no safe method existed to do that without potentially, irreparably damaging either the UPS or the UPS bypass system. The only safe and guaranteed effective plan was to cut the power long enough to shift the phases. Tier Four determined that the situation was an emergency, because any loss of utility power would also take the load offline, and utility interruptions are unforeseeable. Therefore, the decision was made that no more time could be lost and that the power cut had to happen Saturday night. At approximately 10:30 P.M., The electrical contractor cut the power to UPS1. The phases were then realigned and the UPS was reenergized. The UPS started up correctly this time. This emergency power cut lasted 20 seconds. Because of the emergency status, the lateness of the hour, and the need to perform the work immediately, no proactive notification was sent out before the power cut. We apologize to all customers who lost power as a result of this emergency maintenance operation.

We believe at this time that UPS1 is again fully functional. The UPS vendor will be out on Monday to check it out again and ensure that no permanent damage was done to the unit.

Leave a Reply