Tools of the new data centre

Scripting Everything!

Everybody knows the benefits of having a scripted infrastructure. Tools like Chef, Puppet, Ansible or SaltStack all help you with scripting your infrastructure.

You can go further and say not only scripting infrastructure is the right way to go but immutable infrastructure is what we need: re-creatable and disposable infrastructure where you don’t modify the state of infrastructure to reflect outside change, but build it again every time from scratch.

Although I agree with both of the above practices, I think this model is broken and is breaking even more every day.
To explain why I think that, it's better if we align our understanding of the definition of “datacenter”, or more precisely its evolution over time and what I call the “New Datacenter”

The New Datacentre

There was a time when your datacenter was where your servers were in your rack and in your own building. You had to arrange for the power, cooling and network to be built for your datacenter needs. Overtime this changed: after a while it was still your servers, but in somebody else's rack and their building.
Nowadays with cloud, none of it is “yours”, and this goes beyond who’s asset sheet all of this is written on, but who is “responsible” for it. In the IaaS world, it certainly is not us anymore, and everybody is thankful for that.

I believe this goes beyond the servers, disks and network:
If you think about datacenter as the infrastructure that powers your application, naturally you have to think about everything that makes the application work but is not part of the code.

Services like hosted source control providers, database as a service, storage services, error handling services, alerts and notification providers and much more all power our applications and I would consider all of those part of the New Datacenter.

Scripting is broken

So why is infrastructure scripting broken? In an ideal world you write a bunch of scripts that define your infrastructure and get you from nothing to A, where your app is up and running. After that you have to react to changes happening around you: code changes, version updates and new business requirements that mean writing migration scripts to take you from A to B and B to C.

Even if you and your team are extremely disciplined and keep your scripts clean and up-to-date, there's always a high chance of scripts not working when you run them next time. Not because your scripts are broken but because of all of the external dependencies you cannot control: Linux Kernel updates from your cloud provider, apt repository changes or new library releases with loose dependency definitions are just a few that you cannot control and nobody warns you about so you can prepare for. They always get you at the worst time possible: when there is a fire.

As the number of external dependencies in the new datacenter grows the probability of your scripts breaking grows with it. As we lose full control over the entire stack, we are going to risk being more exposed to unwanted and unannounced changes we are not prepared for. Legal SLAs and contract based development is not going to solve most of those issues either.

Scripting languages like Chef and Puppet have their roots in pre-cloud days and are great tools to describe our desired infrastructure. But as we buy more of our infrastructure as services, scripting our desires is going to mean less and less and therefore the old tools that were suitable for the days of full stack control are not going to cut it anymore.

When I look at our customers and how we help them grow and stay up and running I see that what we do is not writing clever scripts or making sure that they are up-to-date. Instead what we do is to be on top of the changes that are happening in this growing network of external dependencies and make sure everything we do is always compatible with the rest of what our customers use. By being ahead of our customers we can be sure that the next time they are going to scale or rebuild their servers, they can do this without being caught out by a broken image or out of date library.

Building tools for humans

For a large part, this is not a problem to be solved by automation. There is a limit to what computers and automation can do. The rest lies with standards, documentation, agreements and contracts, but most importantly applying human intelligence when all else doesn’t work. That is why as the number of nodes in our networks of interdependent infrastructure components grow so will our dependence on human intelligence to identify and resolve issues. And that is an expensive problem to solve.

The tools of the new datacenter are not only automation tools that are built for the cloud era or runtime bundled containers, but those that make it easier for humans to detect and rectify issues at a larger scale and at a cost that doesn’t grow linear to the size of the infrastructure.

The sooner we leave the world of scripting, the better we can all embrace the true power of elastic infrastructure as a service!

Automatic database replication for MySQL, PostgerSQL, MongoDB and Redis

We are happy to talk to you about how Cloud 66 Database Replication is now even more powerful than before.

Database Replication is designed to help you to scale your database and improve your service availability simply by setting up slave/master replication.

You can set up database replication in a single stack or between two separate stacks.

Why do I need replication?

Setting up replication for a single stack can improve your database performance as it allows you to split your read and write operations between two different databases. In this scenario the master is used for read and write operations while the slave is read only. High read/low write apps benefit most from this setup.

However replicating databases between two stacks (or Cross-stack replications) allows you to achieve minimal downtime when you need to move your application from the first stack to the second one. Cross-Stack replication stacks can reside in different data centers, resulting in a failover setup for your application, improving availability and resilience. Move your stacks with minimal downtime, but also you can keep the failover stack in different region.

How does it work?

To get started, first you need to make sure you have Managed backups available on your stack. Here you can find more information about managed backups. Once your managed backups are configured, you can setup your database replication with ease.

Let's see what happens behind the scenes:

  1. We take a full backup of the master database server in your source stack
    Single stack: we create a secondary database server in your cloud and restore your backup on it. Between two stacks: we restore your backup on the secondary database server.
  2. The secondary database is configured to be a slave of the source database
  3. The source database is configured to be a master of the secondary database
  4. The relevant environment variables are updated for use in your code and scripts
  5. Your replication will be monitored by Cloud 66. You will get alerts when there is an issue with the replication.

Database replication on the single stack

Click on your database group in your stack (MySQL, PostgerSQL, MongoDB or Redis) and then click on the green button scale up (you need to have Managed Backup setup to see this button).

This would begin the replication process.

And all done!

Wasp has become a master and Badger is now a slave.

Database replication between two stacks

Note: This feature is available for MySQL, PostgreSQL and Redis.

Imagine you have two stacks. One in Netherlands (let's call it Elephant) and second in the USA (let's call it Mouse). Now you decided to replicated the database from the stack in Netherlands to the stack in the USA. This is what you do:

Note that the source stack (Elephant) would need to have managed backups.

Click on your database server under the database group in your secondary (Mouse) stack:

Next, on the right hand side, click on the Configure Data Replication button.

When you click on it an orange box would appear. Select the source stack Elephant in the drop box and click OK.

Now the process of databse replication between the stacks begins.

A few minutes later all is done.
The database on Mouse is a slave of the database on the Elephant stack and is read only.

The database on the Elephant stack is read/write.

Promoting DB slave to DB master using the Toolbelt

Let's take it one step further: You have replicated your database between the two stacks and now want to switch your slave database to a master (for example as part of disaster recorvery).

How can we do that?

The Manual way

  1. Manually change the configuration files.
  2. Point your environment variables.
  3. Restore the database.

Using Cloud 66 Toolbelt

Simply promote database on the Mouse server (in the USA) and use it as a new master.

How to use it?

To promot a slave database to become a standalone database server you can use the Toolbelt (cx). Here is how it works:

Assuming your primary stack (Elephant) has database replication configured we can use the Toolbelt like this:

$ cx slave-promote [-s <stack>] [--db-type <database type>] <slave server name>

To change the database on the Mouse stack use the following command.

$ cx slave-promote -s Mouse shark

If you have more than one database with replication, you can specify which one you want to promot:

$ cx slave-promote -s Mouse --db-type postgresql shark

The valid db-types are mysql, postgresql and redis.

Resyncing replicated databases

Sometimes it is possible for replicatied databases to go out of sync. You can manually sync them with the following command:

$ cx slave-resync [-s <stack>]  [--db-type <database type>] <slave server name>

For example:

$ cx slave-resync -s Mouse --db-type postgresql my-slave-server-name

Toolbelt allows you to automate the process and help you save time.

Tip: We suggest to perform this action during a non-busy time as it could result in application downtime. You can place your stack in maintenance mode to ensure a better experience for your visitors.

Check out our help page with the explanation on Toolbelt and learn more about Database Replication.

Say hello to Azure!

We are thrilled to announce new cloud vendor is in the Cloud 66 family: Microsoft Azure. Azure combines VMs running on Linux and Windows on a mix of IaaS and PaaS, providing a unique combination of feature sets to their customers.

Since we ran our “Which Cloud Provider Should We Support?” survey while back, Azure has been a clear winner with 22.9%. So as usual you ask – we do!

When should you use Azure?

Microsoft has been very generous with their startup cloud credit (usually $60k), which now you can put to good use by deploying your awesome apps on Azure with Cloud 66.

Another advantage of the Azure Cloud is the ability to mix Linux and Windows servers on your cloud. This will give you more freedom and flexibility.

Let's get started

We are very excited about the Cloud 66 and Azure partnership and we are here to help our customers to explore Azure – so lets get started!

To get started, simply choose Azure from the list of cloud providers and deploy your stack.

Enjoy and as always please sent us your feedback!

Monitoring your CPU, memory and disk space is even better now.

Cloud 66 wanted to deliver to you even better monitoring system then it was available till now. We have replaced the whole system, so you can receive improved Metrics with the updates up to the minute.

You can find the Mertric chart on your server page, and you will be able to choose the period of time you would like to review.

Monitoring CPU

As it was before you will still be able to receive the CPU results in “jiffies”, the unit of scheduling, instead of precentages. Many Linux systems have circa 100 jiffies in one second, but not necessarily need to be shown in precentage.

Monitoring Memory Usage

The values are reported by their use, the operating system, include used, buffered, cached and free.

Monitoring Disk Usage

It will show the same information as when running the df
command directly on your server (divide the value given by 1024 to get MB).

Now you will be able to receive more consistent and accurate data.

Always happy to help, Enjoy!

Better visibility on your Cloud 66 account with Audit Logs

We are thrilled to introduce to you a new feature available now at Cloud 66: Audit Logs. Audit Logs automatically log all user activities across all of your stacks and team members. Audit Logs provide you with all the information you need for each activity, such as IP address, geo-location (accurate to the city) and the team members involved.

Using Audit Logs

You can find Audit Logs under your account menu and only the account owner can have access to Audit Logs.

Audit Log will allow you to search by the activity name and you can specify the period of time you would like to review.

Then you will be able to see all the activities that occurred on your account.

We think Audit Logs is a great way to add more visibility to your infrastructure, team activities and application deployment.

Checking your Audit Logs on a regular basis is a great way to ensure security of your Cloud 66 account.

Learn more about Audit Logs

Enjoy!

Deploy from a Git Ref

Deploying the same code to all of your servers helps maintain consistency in your infrastructre.

How do we ensure code consistency across your stack?

When you start a deployment on Cloud 66, here is what happens:

  1. The latest commit of your git/branch is pulled to Cloud 66.
  2. The code is analysed for changes that will require new components to be installed on the servers.
  3. Any changes to infrastructure are applied based on the results of the analysis.
  4. In case of serial deployments, 1 web server is taken offline and the latest code is pushed to it. It is then put back online (behind load balancer). This process is repeated for each web server.
    In case of parallel deployments, the code is pushed to all web servers at the same time.
    Backend servers go through the same process, but without the load balancer modifications.
  5. Final steps of the deployment are taken (cleanup, relevant deploy hooks, notifications, etc).

When a new server is added to a stack, instead of pushing the latest code to the new server, the SHA hash of the git commit running on other servers is used to ensure the new server runs the same code as with the others.

Support for deployment from git refs

Today we are announcing support for deployments from custom git refs.

Using deployment from git refs you can use any valid git ref, like a commit SHA hash, git tag or branch to tell Cloud 66 what code you would like to deploy to your servers. You can use the Toolbelt with this feature. Here is how:

Deploy from a tag:
cx redeploy -s mystack --git-ref v1.22

Deploy from a commit SHA hash:
cx redeploy -s mystack --git-ref a57b7b025b

Deploy from a branch:
cx redeploy -s mystack --git-ref new_staging

Using this feature follows the same principals to ensure code consistency across your stack in the same way as with the normal deployment.

Deployment from git ref takes care of branch switching and cache invalidations automatically. For example if the given commit SHA hash is not part of the currently deployed git branch, then the stack branch will temporarily switch over to the new one. The next normal deployment (without git ref) will reset the branch back to its original value and deploy from HEAD of that branch.

We know many customers wanted to get their hands on this feature and hope this can help them build even more great applications better!

Cloud breaks, your app shouldn’t

Cloud Breaks

This week we all witnessed another major cloud outage. This time it was Microsoft Azure. According to Microsoft, the outage was caused by an undetected bug that was rolled out to production affecting the block storage used by almost all Azure services.

Cloud outages happen. There has been too many of them to keep track of. Not all outages are major but every day we see many small incidents affecting a small number of services.

I think the best way to mitigate the risk of cloud outages is to first accept that no matter how hard cloud providers try, there always going to be issues affecting their customers availability and performance. Once we accept that as a fact of life we can try to find a way to protect ourselves against it.

Hybrid Cloud, the hype and the reality

Hybrid cloud, where you can weave multiple cloud provides into one infrastructure has been talked about for a long time now. Frankly I have not seen any successful production stacks being deployed on a hybrid cloud, nor have I seen much desire in doing so from customers.

I believe this is because most hybrid cloud solutions focus on merging multiple clouds into one infrastructure and not attempting to build an immutable infrastructure setup that can be replicated quickly and easily on any cloud provider, which I think is the best way to leverage multi cloud providers to achieve high availability and performance.

Immutable stack is one that is not modified but rebuilt every time there is a need for a change. If you can build your entire stack quickly enough on any cloud provider, you can switch your users away from the one suffering from outages in an “acceptable” length of time, depending on the nature of your business.

Discipline requires tools

Building an immutable infrastructure requires both discipline and tools. Good tools help people enforce good practices easier.

Instead of policing your devs not jump on servers and run shell commands directly, you can make it very easy to script out what they want done on a server. Instead of constantly asking everyone to write rollback scripts for their migrations, make it easy to redeploy the whole stack from scratch as quickly as possible.

Not every solution is applicable to every situation. There will always be situations when you need to write migration scripts or rollback ones. We will always need to put out fires or debug issues by having direct access to servers. But let’s focus on the majority of cases. Let’s build our tools and therefore disciplines for the 80% cases instead of getting hung up on the 20% of the cases.

What makes an immutable stack?

Few principles can help us with achieving immutability in our infrastructure:

  1. Limit the “sources of truth” to as few as possible.
    If you have your application code to run on servers, Chef scripts to build and modify those servers, VM images that are built with your build system and database migration scripts, then you have 4 sources of truth. They can, and will, go out of sync over time. You can police this and put safeguards in place to minimise the possibility of this, but it’s always better to have fewer “sources of truth”.
    Moreover, you can be sure that migrations never run end to end from start to finish. Dependencies change and components get updated so often that the migration from state A to state B for your servers 6 months ago is almost guaranteed not to run on fresh new servers now.

  2. Strive to make your data store as reproducible and possible. This sounds easier than it is and can be the most expensive part of your immutable infrastructure. Many Ops teams consider data outside of the remit of building infrastructure. We simply don’t have good tools for data + schema version control.
    Taking pre-deployment backups is one strategy which requires good data store design and backup policies which can be performed quickly enough. Beyond databases, other data stores (like S3 or other cloud based block storage systems) can have version control and rollbacks.
    However when it comes to resilience against cloud outages, multi cloud data availability is the most important part of the solution. Setting up database replications across cloud providers and data centres can help with that aim. In this scenario you constantly keep your data “warm” in multiple locations, prepared for a failover.

  3. Make sure you can redirect traffic as quickly as possible. Using fast response DNS services with low record TTL can help in many cases, although not everything between your servers and your users is going to honour TTL values. However in many cases a combination of low TTL DNS records, traffic proxy services like Cloudflare and maximising the use of CDN for your static assets can help with a vast majority of cases.

Combining these three principles can get you a long way in your quest for high availability and protecting you again cloud outages. At Cloud 66 we build tools to help you with building immutable infrastructure and would love to hear about your experiences with building high availability applications.

Codeanywhere, Deploy anywhere

Today we are announcing our partnership with a great company: Codeanywhere.

Codeanywhere lets developers code in the cloud using their awesome online code browser, code editor and sanbox development boxes, complete with full shell access, syntax highlighting and syncing with Dropbox and Google Drive.

Codeanywhere IDE

Cloud 66 + Codeanywhere

Codeanywhere makes it extremely easy to develop in the cloud, and now you can deploy your code directly from your Codeanywhere DevBox to any cloud provider with a single line using Cloud 66.

Making this work is very simple:

$ cd ~
$ wget https://app.cloud66.com/toolbelt/linux -O cx
$ tar -xvf cx

Now Cloud 66 Toolbelt is available under a directory named with its version. You can copy it to a directory that is in your $PATH like /usr/local/bin

$ sudo mv ~/cx_0.1.10_linux_amd64/cx /usr/local/bin

Now you can initialise the Toolbelt and link it to your Cloud 66 account:

cx init  

Follow the instructions and your Toolbelt is linked to your Cloud 66 account.

Now you can use the Toolbelt directly from your Codeanywhere DevBox to deploy your app.

If you have used the Git Repo address of your code when starting your DevBox, you can deploy your stack from ~/workspace:

Codeanywhere Git URL

cx redeploy  

This does not need the -s <stack> since your stack is deployed from the same Git repository as with ~/workspace.

For more information on using the Toolbelt, checkout Cloud 66 Toolbelt Documentation.

Happy coding!

Playlist.com on Cloud 66

This is a guest post by Jacob from Playlist.com, who recently moved their infrastructure over to AWS with Cloud 66 management. They managed to reduce their monthly costs by 90%, while drastically improving performance. This is their experience.

Background

Hi, my name is Jacob and I am an engineer at Playlist, an internet radio streaming company. In my role, I wear many different hats and touch many different systems encompassing backend, frontend, and devops. Last month, we successfully migrated our Ruby on Rails primary application from Heroku onto AWS EC2 hardware using Cloud 66, with great success. So, I wanted to share some of our story.

Since September of last year, we were utilizing Heroku to host Playlist.com, our main application. We had migrated from entirely custom servers with custom bash scripts for deployment to Heroku for the ease of rapid deployments, and also for the ability to quickly attach addons to our application (like Redis). We also appreciated the ability to scale up our app serving capacity as our userbase grew. Being able to run git push to trigger deployments made everything quick. However, as time progressed, cost and network performance really began to hurt.

The Problem

costs

Running our app was expensive on Heroku. Our hosting bill started out in the hundreds of dollars per month and steadily grew towards thousands per month. In May of 2014, we launched a more complex backend that nearly tripled our costs.

At the peak of the problem, we were running an average of 24 2X dynos (!) per day in order to keep the request queuing time down. Anything lower and requests would start to pile up and response times would begin to climb, and we could literally watch the number of concurrent users in Google Analytics plummet 80%. There were multiple pain points with our backend that could have potentially been solved by a rearchitecting to use more background processes and perhaps something like websockets, but with a small team, we really could not afford to stop development on the next backend in order to patch the current one.

We began building the backend for the next major version of our products. By using Go and incorporating a lot of the lessons learned from the first backend, it was considerably more performant. But in the meantime, we really needed to do something with the Rails app until we could complete and test the new backend.

The Solution

I had heard of Cloud 66 about two years ago on Hacker News, but had never really given it a serious try. After some research, I decided to spin up our Rails app on EC2 using Cloud 66 to see if it might be a viable alternative to Heroku. It was painless to set up. After it analyzed our application, I edited the environment variables with our custom values and told Cloud 66 not to manage the database (we use Amazon RDS), and Cloud 66 built our first server. Customer support was super helpful, especially via chat, and I increased our server count to three (on c3.large instances). Once I was confident that the application was running correctly, I switched our DNS record at CloudFlare, and instantly we were live in production.

Side note: the Cloud 66 toolbelt is very useful. During the migration, it was easy to keep track of the logs with the cx tail command.

performance

The blue bar between 10/21 08:00 and 10/21 16:00 is our switch to Cloud 66, and the difference in web app performance was immediately noticeable. Plus, this was with only 3 servers (instead of those 24 dynos). Our number of song plays per day (an important metric to us) increased almost immediately by about 15%, demonstrating again that web app performance directly impacts key metrics.

Cost was drastically improved. We increased our server count to 4, just to be safe, and with AWS reserved instances, our monthly costs are now less than 10% of what they were on Heroku. App performance is significantly more stable and response times are lower.

Additionally, we did not lose the convenience of being able to deploy with git, and actually improved the workflow by using the Cloud 66 redeploy hook with our continuous integration service, Circle CI. Now if our tests pass, deployment is automatic. Check out the Github Flow for more information on our development process.

Going Forward

We really love the experience with Cloud 66. I was a little disappointed that we would not be able to use Cloud 66 when we switch to the non-Rails Go backend. However, Cloud 66 is working on supporting Docker, so we are happy we will be able to package our app into a container and continue to take advantage of the awesome features and support of Cloud 66.

Another feature I am excited about utilizing is Elastic Addresses - we can spin up a backup deployment on another cloud (Digital Ocean, Rackspace, etc.), and then if (/when) AWS us-east-1 goes down, we can failover to the other cloud.

Overall, we have been very happy with Cloud 66 for our application deployment. If you have not tried it out, I would highly recommend it.

Continuous delivery with Codeship and Cloud 66

Codeship and Cloud 66

We're really excited to announce our partnership with the awesome guys at Codeship! We love the idea of building for the builders (aka developers), and by helping developers focus on their code and automating its delivery, they can do more of what they love.

Why Codeship?
Our partnership with Codeship is founded on the idea of making lives easier for developers, and this integration does exactly that.

By triggering automated tests on your code when you push your code, you can deploy with confidence. Your users will love the speed at which your new features are rolled out!

How does Cloud 66 integrate with Codeship?
Integrating Cloud 66 with Codeship is as simple as copying and pasting a URL!

Once you've deployed your stack with Cloud 66, you'll see a Redeployment hook URL on your Stack information page. When you visit your project on Codeship, simply click "Set up Continuous Deployment", select "Script" and paste your URL into the field in this format:

curl -X POST -d "" [Redeployment hook URL]  

We're working hard to make this process simpler, and would love to hear your feedback! We're all about making your life easier, and we're all ears :)