At Rappid Design we have developed many applications where scale has been a concern. Key stakeholders want to ensure that their application(s) will handle a growing user base should their sales and marketing efforts prove fruitful. “How do we make sure our users don’t experience slowness in the app?”, “What happens if this goes viral?” are questions we get asked regularly. Scaling and performance often get coupled together but the two are not always related. You can have a poorly written app perform badly even when only 1 user is accessing it which comes prior to any need to scale. This article will primarily focus on scaling which addresses handling increasing traffic (load).
This topic is worthy of many, many volumes but I want to explore a common application style and architectural approach which has been widely adopted in recent years. This is shown in the following diagram:
The front end is usually delivered via a browser or a mobile application. Web applications are commonly built with frameworks such as Angular, React and Vue although there are a plethora of very good options. This is often referred to as the presentation layer. A lot of the heavy lifting with respect to rendering pages is handled on the device. The same is true for mobile applications. This is one of the easier aspects to consider when scaling because there isn’t really any scaling to consider. The files that constitute the application are downloaded to the device and then executed locally.
Application Programming Interface / Backend
The API is usually mainly concerned with managing data and passing it between the layers. Most applications are composed of many CRUD operations (Create, Read, Update, Delete). The API handles authentication and authorization ensuring clients only have access to data that is relevant to them. This data is usually stored in a database.
Most applications require data of some kind to be stored. In most cases the choice is between SQL and NoSQL. There are pros and cons of both which will be explored in a future article. We are going to assume SQL as a majority of apps we create here at Rappid use SQL.
If we were going to deploy the above application on our own computer we would have a web server to run the frontend, some service to run the API and a database engine to handle the data. This can be lifted onto a machine in the cloud and almost run as is, albeit with many drawbacks.
A high percentage of apps are now deployed in the cloud with AWS (Amazon), Google Cloud, Microsoft Azure or similar. These companies take care of all the hardware, operating systems and technology stacks so developers can focus mainly on the code side. A deployment in such an environment to be ready for scale might look like the following:
The diagram may look daunting but thankfully most of the configuration and maintenance of all these parts is handled by the cloud vendor.
Content Delivery Network / Frontend
The Content Delivery Network (CDN) is responsible for delivering your Frontend code to the users device. A CDN service usually has nodes dispersed across the planet so users will get a copy of the Frontend from a node nearest them geographically. This can be quite beneficial for apps that have a global customer footprint as there is a noticeable time lag when retrieving files from distant locations.
Load Balancer / Application Programming Interface
The Load Balancer in tandem with the API is where most of the work for scaling happens. There are a number of languages the API can be written in and they aren’t all equal. Some are compiled, others are interpreted and that can impact the number of requests an API can handle. As does the machine it is running on, how well it has been coded and the type of operations it is running. Regardless of these factors, a single instance of an API will reach a point where performance degrades and requests will be dropped. This is where a load balancer comes in. It can be configured to spin up new instances of the API based on parameters such as CPU usage, requests being received per second and response latency. The Load Balancer will then begin sharing the requests (load) between the running nodes.
There are some key things to consider with this approach:
- New API instances may not spin up instantly so it may be prudent to have 1 more instance running that you need so that some users don’t experience slowness while waiting for a new instance to be ready. Some vendors describe this as a warm vs cold node.
- It is possible to write code that consumes most of the CPU for a given request or a few requests. In such cases the end user can experience poor performance yet the scaling part is operating well.
- No amount of good scaling configuration/work can fully overcome really bad code.
Scaling can sometimes hide code inefficiencies. Understanding how many requests per second (rps) an API can handle can quickly highlight areas which need improvement. We have seen cases where we’ve been able to achieve 100 times the original volume of requests per second, often with just a few changes. This can drastically reduce costs and reduce the complexity and overhead. Using a load testing tool such as k6.io will allow you to baseline API performance and validate speed improvements.
As API’s get more complex, teams often (Rappid included) break them down into smaller API’s, often referred to as Microservices. This reduces the complexity of the API but importantly allows for scaling work to be focused in the heavier use areas. We might find that 1 API endpoint gets a lot more traffic than another, thus we see more instances of 1 microservice over another. This might look like this:
The load balancer works out what API is required to service the request and routes the request to that, while working with it to scale up and down as needed. In this example, the Customer API has more instances running which could be because it has many more requests than the others or requires more resources per request.
Function As A Service
It is common to see APIs act as just a data transit layer moving data between the database and the Frontend. Sometimes, however, there may exist some computationally expensive operation like manipulating an image or video. In these cases, the operation in question can consume most of an API instance’s resources thus causing many more instances to be spun up if many such operations are occurring within the same time window. This is an ideal use case for FaaS (Function As A Service). Most cloud vendors provide some means to achieve this. Here we can lift just the function that is resource expensive into its own managed function. A call comes into the API and the API then calls this external function. The function doesn’t need to be scaled because a new instance of the function is created each time it is requested. The operation is performed and the function is then removed. This reduces the impact of resource heavy operations on the API, leaving it to focus on processing lots of requests in a timely and efficient manner.
Most API’s tend to have no or very few functions deployed in this way, but when needed, they can save a lot of time and money.
We are beginning to see entire API’s delivered as a series of functions in a FaaS environment. This can work well but like most things in tech, there are advantages and disadvantages of this which we will cover in a future article.
Most cloud vendors provide a managed database service. They give you somewhere to store application data and they manage the resources for you in the same way they do with an API. Databases can become quite complex and the data you need to extract out of them can hugely impact the resources required to support it. You could have a super efficient API with scaling working really well but if the database is performing badly then everything will creak. There are a number of considerations here such as how we structure the data we store and how we index it for better read performance. Those are beyond the scope of this article but to be ready for scale, it is likely we will need a database cluster. This, like many other parts of a solution, means we can spread the load between multiple nodes. It also provides an essential redundancy layer. Should 1 node fail or be unavailable, there are other nodes to process requests while new nodes can be spun up. These nodes could exist in geographically different locations, thus providing a better chance that the data is always available.
The database differs from the API layers in that it needs to persist data. If an API dies and is not recoverable, a new instance can be deployed to service requests, but as the underlying data is stored elsewhere, there is limited damage. If a database node dies and there are no other nodes, you can lose customer data. This is often why this is considered the most important component to structure well for scaling and resilience.
If you would like a free consultation to look at your scaling needs. Please reach out to us at firstname.lastname@example.org