6 Must-Ask Questions to Prevent Data Center Power Outages
Published on June 9, 2017,
Once again, a data center’s power failure tops the news, as another airline has their flight schedules delayed and passengers spend hours waiting in chaos. No doubt, power loss brings serious consequences to any company’s reputation and brand, not to mention revenue loss from service disruptions. But, there is a way to be more proactive in ensuring the power chain’s integrity — and it starts by asking these 6 key questions:
1. Do I have full transparency into all inter-connected devices and systems?
It is absolutely vital that the power chain is documented all the way through — from when the power enters the building, through the UPSs, PDUs, and all rack-mounted equipment. Simply put: you need to know what is connected to what, as well as the devices’ respective interdependencies. With this knowledge, you can understand the potential impact if a certain piece of equipment fails or is taken offline for maintenance. Additionally, you should also be aware of the maintenance status for each power chain device, e.g. what’s the useful lifecycle status?
2. Am I monitoring my operations in real-time?
Real-time power monitoring — what’s going on in your power chain right now — is critical. You must know, at any given time, what energy is being used, where and by which devices. BMS systems are very useful, but they are also very specific and tend to keep data siloed. You need to ask yourself, “Do I have the capability to look at all the information, all the infrastructure components in the facility and see the entire power management system in one place via a single pane-of-glass view?” This holistic view brings real-time monitoring and alarming that enables data center operators to mitigate risks and make changes to avoid disaster.
3. Have I documented the data center’s resilience?
The ability to perform power failure simulations by “virtually” switching devices off — without affecting the production environment — allows for a well thought out action plan to recover services. We’ve seen the news headlines of data center operators who assumed their power chain and back-up system are foolproof, without a failsafe test. Power failure simulation enables you to locate where redundancy is lacking and uncover single points of failure. Needless to say, it’s imperative to build and document your recovery plan in advance of a catastrophic power failure.
4. Are all my stakeholders on the same page?
IT personnel and facility managers must work together and share information — IT as a service has to be brought into the facilities side. This helps avoid situations such as overloads because new equipment was brought in and facility managers were unaware and unable to properly support it. Again, documentation is key so that consistent information is shared across the organization. And consistent information sharing allows everyone to look back onto what’s been done and improve on procedures to avoid future disruptions.
5. Can I identify the changing trends in my power system and respond accordingly?
This is the flip side of real-time monitoring. As critical as up-to-the-minute information is, it’s also vital to be able to analyze data center performance over a long period of time, so trends and patterns can be pinned for easier, long-term forecasting. You can now plan for change and fluctuations, balance load, predict future capacity needs, plan workflow, and schedule service.
6. What is the overall vulnerability of my power chain?
Traditionally, security has been the focus of the IT department. But, what about the security concerns of facility managers? There are many more data center devices that connect to a network besides what’s contained in racks; there are terminals and points of access everywhere. You should also question:
- When is the last time your passwords were changed?
- Can an outside contractor have access to a device that can shut down your whole building or transfer load?
Again, documentation and control is critical to preventing disasters; more hardware is not the answer to preventing catastrophic power outages. In fact, adding additional hardware actually makes the control situation worse.
A proven solution to power management can be realized with a Data Center Infrastructure Management (DCIM) solution. DCIM enables IT and facility personnel to run the data center at peak efficiency, while allowing all stakeholders to improve overall operations while identifying vulnerabilities to keep the power chain safe.
With a DCIM solution deployed, full data center operation’s visibility is revealed that helps eliminate the communication silos between IT and facilities by sharing real-time data vis-a’-vis easy to understand charts and graphs. Let’s face it, preventing catastrophic power outages can be made a lot easier if the proper DCIM software solution is in place and connected to critical assets.