Building bullet proof applications on public cloud

Remember Hurracane Sandy?

We all remember when Hurricane Sandy took AWS and many other US East Coast data centres offline. Those were really painful and expensive days for many of us. Luckily disasters like Sandy happen rarely. However we have all seen how upgrades, hardware faults or network issues can take our sites and mobile backends offline.

Making Web Apps bullet proof

Imagine you had your site hosted on AWS US East when it was taken offline. Having a standby copy of it on DigitalOcean US West Coast could have really helped you.

Let's see how we can make that happen. There are three parts to achiveing this goal:

Code
Data
Traffic

Code

To build a standby stack you need to deploy your code to a new set of servers. This can be tricky since all your deploy scripts can be tied to single cloud provider or fixed set of IPs. Also most probably you have deploy scripts that deploy your code to existing servers but they don't build new servers for you.

Being able to deploy your code to a new set of servers is the most critical part of recovering from down time. The problem is, in most cases you don't know if the whole thing works end-to-end until it is too late.

The key to this step is to build immutable infrastructure.

Using tools like Chef can help by allowing you to automate a full stack build and deployment. You need to make sure the scripts can run on multiple cloud providers and are kept up to date as your application evolves.

Cloud 66 can also help you with this part as it runs on all major cloud providers and doesn't need any update to your scripts. It only uses your code to determine what needs to be deployed. Clone your stack and you're there!

Data

Data is always tricky to move. Big databases can take hours to move and you can't afford to be down for hours. The best strategy here is to setup DB replication to keep your data warm in your standby stack. Setting up and monitoring a DB replication can also be tricky. How do you know your DB replication is working and your replica is not off sync when you need it?

There are guides around on how to setup replication for MySQL, PostgreSQL and MongoDB as well as Redis.

Altertively, you can use Cloud 66 database replication across two stacks. With cross-stack DB replication, Cloud 66 replicates your MySQL, PostgreSQL, MongoDB or Redis databases with a couple of clicks. But we don't stop there, we also monitor your replications and let you know if the replica is out of sync with the master before it's too late.

Traffic

Now that you have a working stack with production data standing by all you need is to switch your traffic to the new stack.

The best practices here are:

Always have a load balancer so you don't need to change your DNS in most cases.
24 hours before switching over your traffic, reduce your DNS record TTL to 5 minutes.
On the day of the move, switch the DNS record and increase the TTL back to 24 hours.

The issue with this approach is that when there is an emergency you don't have 24 hours.

This is where Cloud 66 ElasticAddress can help. ElasticAddress is a fast response DNS server with 100% uptime that quickly diverts your traffic from one stack to another. All you need to do is to point your domain to your ElasticAddress. ElasticAddress automatically tracks your web servers so if you add a load balancer it will update without your intervention.

Summary

Creating high availability applications takes a lot of engineering and even more planning. Not only that, your operations have to be always reflecting the changes in the development side of things to keep high availability an objective.

Ensuring your automation scripts are always up to date and can work with multiple cloud providers, your data is always replicated and your traffic can be quickly and easily switched over is the best place to start planning for emergencies and achieving high availability.