503 Back-end server is at capacity

This is a simple concept which some may find it very basic, but since we recently have seen a lot of users having a problem serving their site after adding Loadbalancers, I thought it is worth mentioning. Most of the issues were related to redirecting HTTP to HTTPS, a few were related to backend servers are being slow.

This approach can help you find most of the issues with your website and find which part is having the issue.

Before I go into details I'm going to give you a few keys that you need to always consider :

  • Loadbalancers (LB) have a mechanism to check whether their background servers are healthy or not -it is called health check, which essentially is a simple http request to the backend server's IP address, which you can almost mimic it by running:
$ curl -I http:/BACKEND_SERVER_IP_ADDRESS/

When I say you can almost mimic it, I mean it is not 100% accurate as LB's health check requests have a timeout on the LB side. For instance, if your server is slow, you may get a response from the server by running the above curl command, However, LB may close the connection before the response created due to backend being slow -That is the whole point of health check and load balancing, right?

Note:
LB expecting 200(ok) HTML header codes from its backend servers. If the LB decides the backend is not healthy it will not send any traffic to it as if it doesn't exist .i.e you cannot force it.


  • HTTPS needs SSL certificate to be served, so make sure your stack has SSL certificate installed

  • The best tool is the curl command

  • Leaving the redirection to the web server (Nginx for Cloud66 stacks)

Nginx is a web server so it would be best if you leave all the web server's responsibilities to it, such as redirection. the HTTP to HTTPS is a redirection and it is best not to force it in your application.

These are the cases I've seen

On Rails stacks some have force_ssl enabled in their app which is not advised for production as it is Nginx responsibility, and also make it difficult to pin point the issue.

On Docker stacks, some users set that inside their app/container which again is not a good practice.

These are the steps to take:

First, you need to see if your website is pointing to your LB by running:

$ dig +short LB_ADDRESS yourwebsiteaddress.com

To find the LB_ADDRESS Go to your stack's page and find the cloud 66 address of your stack, say it is http://example.c66.me/ and see if the results are the same if they are then moving to the next step if they are not you need to fix your DNS first and then go through this again.

Then run:

$ curl -I http://yourwebsiteaddress.com/

and you'll get something like below (503 is important):

HTTP/1.1 503 Service Unavailable: Back-end server is at capacity  
Connection: keep-alive  

This means that the LB considers all the backends to being down, so it will reply with 503 (Back-end server is at capacity).

Now try to curl to one of the backend servers directly it is probably something like http://servername.example.c66.me/

So you need to run:

$ curl -I http://servername.example.c66.me/

The result of this is very important, because you may be able to surf the site if you hit the server directly but not via Loadbalancer.

The result must be:

HTTP/1.1 200 OK  
...

That is the only acceptable result from LB (obviously not if you are running your own DIY LB). Otherwise, the health checks requests from the LB will fail which means it will consider its back-end to be down and will not forward them the user's request.

In Cloud 66 Nginx config we take care of this by not redirecting the requests from LB and redirect the rest.
So if you force the SSL in your app this will play havoc with our settings, unless you take care of them yourself.

If the response is 200 and you still see 503, you'll need to check the backend server's Nginx access.log (you need to ssh to one of the backend servers). The access.log file is either /var/log/nginx/access.log or /opt/nginx/logs/access.log.

You can follow the logs on the server by and see if you see any error code there that happens constantly (LB sends health check request every couple of seconds):

sudo tail -f /var/log/nginx/access.log  

or

sudo tail -f /opt/nginx/logs/access.log  

and decide based on the error codes you see.

To reiterate, here are the main issues that could be causing the load balancer register your servers as unhealthy:

  1. We provision your load balancers with a default health check end-point of "/". If this end point does not exist in your application, you can change the health check endpoint in the manifest. Note that you will have to reinstall your load balancer.
  2. Your app is forcing SSL. Load balancers will generally use HTTP for the health check and is expecting an HTTP 200 response. If your app is forcing SSL, then it will redirect the HTTP request to HTTPS, which is done by responding with an HTTP 301, which the client will generally follow. However, load balancers will NOT follow redirects, and the HTTP 301 will be registered as an unhealthy server.You should instead let NGINX handle the SSL specific actions (which we configure for you) and remove the HTTPS redirection from your app.

Now I've talked about this I think it is worth mentioning that from this approach you can solve most of your web server's issues

for instance, you may run this

$ curl -I yourwebsiteaddress.com

and you get something like:

HTTP/1.1 301 Moved Permanently  
Server: nginx  
Date: Tue, 12 Sep 2017 11:25:48 GMT  
Content-Type: text/html  
Content-Length: 178  
Connection: keep-alive  
Location: https://yourwebsiteaddress.com/  
X-Powered-By: cloud66  

Then you need to curl to what the Location indicates in the last one:

$ curl -I https://yourwebsiteaddress.com/

And find the next step from there or if there is any error fix the error, I think you get what I mean.

Try Cloud 66 for Free, No credit card required