Last month’s EC2 outage at Amazon prompted a lot of scaremongering around cloud computing’s readiness for prime-time, and a lot of (justified) reflection on the merits of current cloud service-level agreements (SLAs). A network configuration change caused problems for EC2’s Northern Virginia data center, impacting availability of sites like Foursquare and Reddit. Other EC2 users, like Netflix, with backup server instances in other ‘availability zones’ were able to ride out the storm and continue to provide service. So…infrastructure-as-a-service is still subject to the laws of physics and IT fallibility. What’s the real newsflash here?

Cloud computing may be making IT a network resource, but resiliency and reliability are just as big of a concern in the cloud as they are in traditional IT infrastructure. As Lew Moorman, chief strategy officer of Rackspace, explained, “The Amazon interruption was the computing equivalent of an airplane crash. It is a major episode with widespread damage. But airline travel is still safer than traveling in a car.” Combine that safety argument with the undeniable cost savings of outsourcing data centers, and you can see why cloud computing will continue to be a compelling business strategy despite this very public outage.

The biggest challenge for users is negotiating and enforcing appropriate cloud SLAs — interestingly, since Amazon’s SLAs commit to 99.5% uptime for customers with deployments in multiple ‘availability zones’, they were still in compliance despite the N.Va site outage. SLAs obviously need to evolve to cover more granular quality metrics, but it will be difficult to reach agreement on measurable performance indicators that are meaningful to every customer. Beyond a very basic measure like uptime, different breeds of enterprises will have vastly different quality of service requirements, and negotiating agreements with cloud providers will require a deep understanding of your critical applications, as well as honest reflection on what you can/can’t live without for a short time. For example, someone hosting their retail business in the cloud might be more concerned about concurrent connections and connections per second to their virtual web server, whereas as a content provider like Netflix might care more about video-quality metrics. Documenting performance expectations, testing to verify delivery, and enforcing agreements will be no mean feat, but will be key to building confidence in cloud computing for business-critical applications.

So what conclusion can we draw from all of this? With IDC predicting that corporate cloud computing will continue to grow by more than 25% a year to $55.5 billion by 2014, we can assume that the cloud performance measurement and SLA enforcement will be a critical issue for many years to come. Measuring end-user quality of experience and network resiliency are ‘bread and butter’ for Ixia, so this cloud definitely has a silver lining: Amazon’s public woes are a great reminder to us all of the value of testing.

Post new comment
The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><iframe> <p>
  • Lines and paragraphs break automatically.

More information about formatting options