Recently we identified an issue that affects Cloud 66 Maestro CSv1 (Legacy Docker) Applications and Node Applications that are running on AWS. This issue was caused by a change in a core networking library used by AWS, which resulted in some breakages in parts of the Docker networking configuration have been using. The good news is that we've resolved the underlying issue. Unfortunately, fixing your application, while trivial, will require your action.
Who is affected?
If you are running on AWS servers, and have Maestro CSv1 (Legacy Docker) Application or Node Applications, your application is most likely affected by this issue. The affected applications will see a StackScore drop to "F" and the corresponding StackScore warnings about this. If you have your StackScore email notifications enabled, you will have received an email about this drop.
If your application are not running on AWS or are not Maestro CSv1 (Legacy Docker) Applications or Node Applications, your existing applications will continue to work. You can still update them at any point.
What is the issue?
The problem is caused by an incompatibility between the networking library we were using, and a change made by AWS to their own core networking libraries (most likely in line with hardware configuration changes). This issue presented itelf in that when one of the affected servers restarted (or recovered from a crash) they would see the error
Cannot connect to the Docker daemon and as a result, containers could not talk to each other.
What does it mean for my running application?
While your application is going to continue to work, it could break if your servers crash or if you restart your servers manually. Note that while it isn't unheard of, your servers will not normally restart without your intervention.
How can I fix this?
We have rolled out a fix to this issue, so newly created applications will no longer be affected. Unfortunately, applying the fix to existing applications will involve recreating the networking components (and likely updating docker) meaning downtime during this operation. The downtime duration is during the reinstallation of docker components, networking, and re-pulling and restarting containers. For most applications, this means around 3 to 10 minutes of downtime. (That's why we didn't roll this out automatically, so to allow you to plan on the best timing to apply the fix)
To apply the fix for this issue, use the "Deploy with Upgrades" option instead of a simple Deploy next time. Please select the following options:
- "Parallel Deployment"
- "Apply Security Upgrades"
- "Yes, reboot my servers if required"
- "Apply Docker upgrades"
This will upgrade the networking layer of your application and fix the underlying issue. If you need help with this upgrade, please get in touch with our support team. We will work with you to patch your applications!
Cloud 66 Crew