7 Mistakes That Prevent Successful Scaling of Your Cloud Native App

As software products move from the MVP phase to the growth phase, scaling your software becomes even more important. However, many of the assumptions that were built into your systems will no longer be valid. This can lead to mistakes that prevent taking full advantage of the scaling capabilities of your cloud infrastructure. It can also have a real impact on your customers as you struggle to implement the necessary steps for high availability and elasticity required of today's cloud native applications.

Let's examine 7 common mistakes made as cloud native applications start to transition to the growth phase, and how to prevent these mistakes through proper cloud native architecture practices.

Mistake #1: Assuming that servers will run forever

When deploying to a data center or to the public cloud using a traditional approach, servers are assumed to be long-lived. These servers are considered the life blood of the product and are meant to live on for years. They must be continually updated, patched, and kept running at all costs.

By contrast, cloud servers should be considered utility resources that may be created and terminated on demand. Cloud providers encourage cloud native applications to be resilient to outages. This means that software must be prepared to deal with underlying hardware failures, reboots for critical security patches, and other issues that can cause an individual server to become unavailable temporarily or permanently.

As cloud servers are considered utility resources, cloud native applications should be built with a share-nothing design. This allows servers to be added and removed to handle current load requirements - commonly referred to as elasticity. If cloud servers are assumed to run forever, elasticity based on application load cannot be met as existing server resources become overwhelmed.

Learn more about taking advantage of short-lived cloud servers.

Mistake #2: Failure to use load balancing and autoscaling groups

As cloud servers are considered utility resources, cloud native applications should be built with a share-nothing design. This allows servers to be added and removed to handle current load requirements - commonly referred to as elasticity. If cloud servers are assumed to run forever, elasticity won't be possible as the number of servers cannot be varied based on application load - resulting in server resources becoming overwhelmed.

Load balancing and auto scale groups help prepare your cloud architecture for supporting elasticity. Scaling groups allow servers that perform the same task to be managed together, allowing for the increase or decrease of the servers within the group based on demand capacity. Load balancers distribute requests to each server within an auto scale group. When combined, your cloud native architecture is able to distribute incoming requests across a group of servers that are scaled up or down based on the overall load.

Learn more about using load balancing and auto scaling for achieving elasticity.

Mistake #3: Assuming local filesystems are long-lived

Local server storage is still an option for cloud native architectures. However, local storage should be considered ephemeral (i.e. not long lasting). Instead, local storage must be used only for storing temporarily files and logs before they are moved to long-term storage.

This approach requires that local configuration files should be obtained externally, since the server may be destroyed at any time. Databases using local ephemeral storage should either be replicated to other servers immediately, or applications should be designed to withstand data loss due to server shutdown.

There are three common cloud storage solutions available for cloud native applications: local ephemeral storage, network filesystem storage, and object storage. Each one offers specific advantages and disadvantages. We recommend reading our guide to cloud storage to determine the best one(s) for your application's needs.

Mistake #4: Inability to scale out database queries

There are a few techniques that enable a database to scale out with more servers and therefore more capacity. Some techniques work better for specific types of databases or for specific vendors:

Read replicas allow data to be available for reading across any number of servers, called “slaves”. One server remains the “master” and accepts any incoming write requests, along with read requests. This technique is common for relational databases, as most vendors support replication of data to multiple read-only servers. The more read replicas installed, the more read-based queries may be scaled.

The multi-master technique may be used to allow any client to write data to any database server. This enables all read replicas to be a master rather than just slaves. This enables applications to scale out the number of reads and writes. However, this choice comes with additional requirements that the application must support.

Horizontal partitioning, also called “sharding”, distributes data across servers. Data may be partioned to different server(s) based on a specific customer/tenant, date range, or other sharding scheme. Vertical partioning separates the data associated to a single table and groups it into frequently accessed and rarely accessed. The pattern chosen allows for the database and database cache to manage less information at once, improving overall throughput.

Read our guide to scaling SQL and NoSQL databases for more details.

Mistake #5: Lack of distributed logging for troubleshooting

Once your cloud native application is designed for elasticity, troubleshooting problems can be difficult. Your servers will be created and destroyed routinely, making the practice of logging into a server and reading logs very difficult or even impossible.

Distributed logging enables servers to collect and aggregate log entries across one or more log servers. Distributed log collection overcomes a number of issues encountered by other logging solutions in a cloud native architecture: normalized log entries for easier searching, horizontal scaling by the distribution of collectors across any number of servers, and the ability to integrate with other monitoring and analysis tools through plugins. Distributed logging may also be used to capture, aggregate, and stream events as well as log entries.

Learn more about how to design a successful distributed logging solution.

Mistake #6: Lack of background processing and messaging support

As our applications grow, we must find ways to keep manage software complexity while scaling our application. One of the most common techniques is to offload work into the background that doesn’t have to be performed immediately. This frees applications to respond back to the mobile or web user immediately, while handling the work at a later time. It also allows other work to occur outside of the normal flow of the application.

A well-known solution to this problem is to use a distributed messaging system. A distributed messaging system enables the application to communicate by sending and receiving messages between various parts of the system.

Distributed messaging scales by increasing the subscribers as needed to handle more unprocessed messages while reducing them with fewer unprocessed messages. Our application doesn’t need to know how many subscribers exist, it only needs to know how to create the message and send it to the broker.

This results in a more loosely-coupled application through the composition of multiple services that are deployed and scaled independently, without the rest of the system knowing it. Distributed messaging is the foundation for a microservices architecture.

Read our guide on distributed messaging techniques to scale cloud native applications for more details on how to offload background work and introduce a loosely-coupled architecture.

Mistake #7: Failure to move processing closer to the source

The Domain Name System (DNS) is similar to a phone book for the Internet. It is most often used to associate a server’s IP address to a hostname (A records), alias one name to another (CNAME records), route email to the proper mail server (MX records), and general use text entries (TXT records). Some DNS servers support round-robin DNS, allowing multiple IPs to be listed for a given entry that are returned in a rotating fashion.

Round Robin DNS is also used to distribute incoming traffic across multiple data centers, often spread across multiple geographical regions. Rules are established to route traffic to a specific data center based on the country where the request originated, improving the overall performance of the application. Without geo-based DNS support, requests from one country (e.g. South America), may be sent to servers in another country (e.g. North America) causing slow network performance.

Learn more about using DNS for global/geo-based load balancing.