Troubleshooting common build and deployment problems

alt

The vision behind Cloud 66 was to create tools to help other devs like ourselves make app deployment easy. It was about saving you time. Making things that little bit easier to help you focus on investing your energies on the right things - developing great apps.

As we've found out from regularly speaking with our customers, many of you actually enjoy the process of picking up operational skills for personal grow and to up-level your own capabilities. And we love how Cloud 66 makes that easy for you as the bridge between dev and ops.

Having said that, we also know that sometimes it's possible to get stuck on a minor technicality, which can bring things to a grounding halt. So I thought I'd focus my post on running through some common scenarios when building servers or when you're getting ready to deploy an app.

Scenario 1: Package Fails to Install

This is typically caused by connectivity issues between the server and apt-get repos - unless the package installation has changed, which almost never happens. It could be that the repository is down, or there is an issue with the underlying hardware, so your server cannot connect to the sites it needs to. We recommend following these steps:

If it's a stack-build scenario, recreate the stack so your server will be built on different hardware.
If it's a scenario where you’re scaling up, you should remove the server and scale up again by building your server on different hardware.

Scenario 2: x509 error in a Docker container

The x509: failed to load system roots and no roots provided error appears when you attempt to run HTTPS from inside a container. It can happen if the base image used to build your Docker container doesn't have the root CA certificates installed.

Here's how it can be reproduced:

FROM ubuntu:14.04.1  
CMD curl https://www.google.com

The solution is to install this while building your Docker image:

FROM ubuntu:14.04.1

RUN apt-get update
RUN apt-get install -y ca-certificates

CMD curl https://www.google.com

Scenario 3: Cap Deploy Failed Error

Cloud 66 uses Capistrano to deploy Rails stacks. In the event that your deployment fails, you’re likely to see this error Cap deploy failed in your logs.

In order to understand why the failure occurred, you'll need to:

Click on the "full details" and/or help link
Under "Error Log Server", click on view logs (which will take you directly to the failed server)
If it doesn't give you the full log details click on "MORE LOGS" on the top right of your log box
From the bottom, scroll up to find the listing explaining why it's failed

Scenario 4: Asset Pipeline Error

Asset pipeline manifest configurations can be the cause of deployment failures if there are existing issues with old assets.

It's possible to manually clear up old assets on the server. You can do this by starting a terminal connection to your server and applying the following steps:

Remove all the contents from your $STACK_BASE/shared/assets folder
Create a new, empty manifest.yml file by issuing touch $STACK_BASE/shared/assets/manifest.yml

Scenario 5: Lets Encrypt installation error

During the Lets Encrypt installation process, you could run into an error that looks something like this:

Wrote file to /etc/cloud66/webroot/FILENAME, but couldn't download http://DNS_NAME/.well-known/acme-challenge/FILENAME

If this is the case, you'll need to implement the following steps:

Delete the existing SSL certificate prior to step 2
If your infrastructure is behind Cloudflare and you're using a global HTTPS redirect, you'll need a pagerule to get things working. Make sure you add a pagerule because Let's Encrypt needs a non-secure HTTP endpoint (/.well-known/acme_challenge/*) to invoke and reissue certificates.
There could be some missing parts to your Nginx Config, potentially because of a customization or config file not being up to date. You can find instructions on how to apply HTTP to HTTPS redirects on our help page here.

Remember to first delete the SSL certificate and then apply your changes!!

Scenario 6: Backups Failure

On occasion, you may run in to issues with setting up backups, and get the following output:

" ModelError: Backup for task (....) Failed!"," An Error occured which has caused this Backup to abort before completion."," Reason: Packager::PipelineError"," Failed to Create Backup Package"," Pipeline STDERR Messages:"," (Note: may be interleaved if multiple commands returned error messages)"," "," The following system errors were returned:"," Error: SystemCallError: Unknown error 141 - 'tar' returned exit code: 141"," Error: Errno::EPERM: Operation not permitted - 'split' returned exit code: 1"," "

First, check to confirm your server has enough space df -i and df -f. If not, you'll need to clear the server to accommodate. It's worth noting that the run time of how long it takes to perform your backup may be what's causing friction in the system. For example if you set your backups to occur hourly, but the actual time it takes to complete the backup process requires more than 60 minutes.

So there you have it. One source of more immediate feedback is our Cloud 66 Slack community, where you can always tap into the collective knowledge of other Cloud 66 users. There's also lots more content like this on our Help pages. Thanks and happy coding.