Amazon Cloud Outages – How DCIM Can Help

In light of the Amazon Cloud outages last month, one instantly has to think about what Amazon, or really any cloud provider should be considering to mitigate such risks. In general, cloud service providers tend to focus on pure IT Management and virtualization. But what about the actual infrastructure itself: how is it being managed, and in the event of a catastrophic failure, how should it be brought back online in the most expeditious manner? An area where cloud service providers may very well be at risk is in the process of managing their infrastructure – that is, do they have proper processes, procedures and methodologies in place?

In the most recent Amazon outage, according to the Wall Street Journal, “Generators kicked in but failed to stabilize the load. Power went off to part of the data center. Then a software bug delayed recovery.” No doubt a Data Center Infrastructure Management (DCIM) solution could have helped mitigate this. DCIM can track the power chain and can also remind IT and Facilities teams to test the simulation of a generator failure beforehand, under more controlled conditions when traffic is known to be low, thus identifying and eliminating the risk ahead of time. Redundancy may be mandated by the company, but only good process management, the kind of management that a DCIM solution can provide (among other capabilities), can enforce it.

There has been too much focus lately on DCIM as a monitoring mechanism only. The problem is that real time monitoring is too late to solve for problems like the one that happened at Amazon. For DCIM to provide real value, it has to be more about establishing good process and procedures that help run an efficient, lower cost and lower risk data center by avoiding problems before they happen.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

branch circuit

What is Branch Circuit Monitoring and How Does It Impact Power Usage Effectiveness?

Many people ask, “What is Branch Circuit Monitoring?” The answer: Branch Circuit Monitoring enables users to track power usage effectiveness...

Posted 11.07.17 in Data Center Efficiencies by Mark Gaydos

Servers

Server Utilization Metrics Meets Data Center Monitoring

Stagnation breeds obsolescence. Nowhere is this statement true than when considering the needs of data center managers. Full visibility into all...

Posted 06.20.17 in Announcements by Mark Gaydos

Businessman walking in server room

6 Must-Ask Questions to Prevent Data Center Power Outages

Once again, a data center’s power failure tops the news, as another airline has their flight schedules delayed and passengers...

Posted 06.09.17 in Announcements by Mark Gaydos

600x200 Move to the Next Gen DCIM FINAL

Webinar: Move to Next-Gen DCIM – It’s Easier Than You Think

Inertia – noun, a tendency to do nothing, or to remain unchanged. In the 12+ years we’ve been in the...

Posted 04.07.17 in Announcements by Mark Gaydos