Safe Landings in the Cloud
My first attempt to migrate a legacy website to the cloud was a disaster. We had a consumer facing website that was highly optimized (translation: hacked by a well-meaning sys admin) to run well on the hardware and OS on which it was originally deployed. The hardware was getting a bit long in the tooth and the platform was years out of date. In a nutshell, it was time for a massive upgrade of infrastructure and platform.
The direct replacement cost for the hardware was well outside of the company’s budget and financing a large capital purchase was not an option. Managed service providers offered dedicated hardware or Virtual Machines (VMs) with promises to relieve us of the burden of operating and maintaining the infrastructure. However, the total cost over the term of the agreement exceeded the purchase price of the equipment and salary of a full-time sys admin by a factor of 4. To complicate matters, free visitors and paid-subscription account visitors had been growing at an increasing rate year-over-year and the solution had to account for anticipated growth over the next 3 to 5 years. At this point, the siren song of the cloud was too alluring to ignore and I made the decision to move to the cloud. It was time to pack up and move.
Our plan was simple-select a cloud provider, launch the virtual counterparts to our physical infrastructure, install latest environment and restore to the new virtual environment, and launch.
At first, everything seemed to work just fine. We launched late in the evening when traffic was low. By morning, as site traffic grew, the picture quickly changed; it was ugly. The site was unreachable by Noon and a firestorm of angst spread through the company. I was determined to get the site working without rolling back. We shut the instances down and re-sized them to the largest that our cloud provider allowed. After booting up, everything seemed fine. Within hours, however, the site was once again unreachable. I had no choice but to roll back to our old physical infrastructure.
Finding the root cause proved more challenging than I’d anticipated. None of us had experience working in the cloud and our understanding of the performance characteristics of cloud infrastructure was limited. Our diagnostic process restricted our understanding of how things work in the cloud versus on dedicated hardware running on a private network. We struggled to make the shift to cloud-think. Then there was that light bulb moment. In the cloud, we had split our web server and database server across two instances just like our physical deployment architecture. In the physical world, the two servers were operating side-by-side in the same cabinet and inter-networked over a dedicated gigabit LAN. Disks were attached directly to the database server.
In the cloud, on the other hand, the instances were separated by a half dozen hops. Disks were virtualized. As traffic to the site ramped up, web application to database, chatter ramped up as well and we ran smack into a performance bottleneck that rendered the site unreachable. The immediate quick solution was not particularly attractive; launch the web application and database server on a single monolithic instance. That, however, did not completely solve the problem because the disks were still virtualized and running on a storage attached network (SAN). Compounding matters was the high per minute fees that our cloud provider charged for such instances because, the financial value proposition of move to the cloud. I lost support for the project and we moved to Plan B.
In retrospect, the decision to move to the cloud predicated on the assumptions that cloud computing is simply a lowercost alternative to physical computing. Otherwise, they are essentially the same and that the easiest and safest path to the cloud is to mimic one’s physical infrastructure (I am referring specifically to the cloud form called IaaS or Infrastructure as a Service). These were naive assumptions, of course. Success in the cloud starts with understanding the differences between cloud and physical computing. It follows with knowing your own application’s architecture, in particular, performance dependencies like disk I/O speeds, network bandwidth, latency, and QoS requirements, CPU and memory speeds, etc. If you can’t meet these requirements in the cloud, then it’s important to rethink (and rework) your application and deployment architectures. If you are going to rework your application’s architecture, then you should also consider going beyond IaaS to PaaS or Platform as a Service.
Having learned my lesson, my experience in the cloud since that fateful one have been far more rewarding and the many promises of the cloud have held true. I am a strong supporter and proponent of cloud computing. That said, it is not for everyone nor is it suitable for every application. Prepare, choose wisely, and make your move. Best of luck and may you have a soft landing in the cloud.