Vertical vs Horizontal Scaling

November 19, 2023

Overview

There are only two ways to scale a system so that it can handle more load -- vertically (up) or horizontally (out):

Vertically (or up): make the thing larger. Conversely, down: make the thing smaller to reduce resource usage.
Horizontally (or out): add more of the thing. Conversely, in: remove some to reduce resource usage.

Horizontal Scaling

Scaling a system out adds complexity and more points of failure. However, in the context of availability, the positive trade-offs dwarf the negative ones if the application nodes are scaled out as replicas. This is a paradox. While there are technically more points of failure when adding more replicas, the practice overall has a net-positive effect on availability. Adding replicas is not the only way to scale an application horizontally. You could also consider breaking the application into smaller purpose-specific services. This alternative solution requires mitigating the additional points of failure if availability is important. A common mitigation is to horizontally scale individual services (with replicas) as a mechanism for failover (in case a node is lost due to some failure) if not for additional scaling.

All forms of horizontal scaling increase latency, making it unfeasible from the perspective of performance for some use cases where performance is most critical. This is because there is now a load balancer sitting in front of compute nodes routing traffic through a network (which adds latency on the order of milliseconds).

More Replicas

The primary way to scale an application is by adding more replicas (compute nodes) to a scaling group that sits behind a load balancer. Enabling an application to function seamlessly despite this added complexity (especially during scale up and scale down events) requires care. This care includes:

Storing any persistent data somewhere separate from the application nodes (caches can be stored either in the application nodes or elsewhere centrally).
Ensuring graceful shutdown of the application nodes (so that any in-progress requests complete before the node is shut down).
Ensure that no essential state is stored between requests in the application nodes (caches should be non-essential such that they can be rebuilt if a node is lost).

Periodic Workloads

A periodic workload is one where work must be completed on an interval. An example of this is an hourly batch-processing service. Periodic workloads can benefit from horizontal scaling just like most workloads can. The important consideration is to make sure the amount of time it takes to consume and process each period's batch is less than the duration of the period itself. Ideally, this type of workload is completed using a highly elastic compute solution that can be scheduled or triggered, then taken down once finished.

Smaller Purpose-Specific Services (aka Microservices Architecture)

When applications grow larger, additional scaling mechanisms are required other than simply scaling a monolithic application horizontally. This is where Microservices Architecture typically comes into play. With this architecture, parts of the application that experience more load can be scaled independently of the rest of the application. If it's known from the beginning of a project that scale is an expectation, then this architecture can be implemented from the start of the project. A robust and sophisticated tool to achieve this architecture is Kubernetes.

Going Too Small

It's easy to go overboard with breaking a system's microservices into smaller and smaller pieces to the point where they become nanoservices. This can become both a cost and maintenance overhead. This nuance is discussed in Microservices vs Monolithic.

Vertical Scaling

Scaling a system up is straight-forward -- swap the size of its machine (or compute node) to something larger. More CPU, RAM, or storage. Doing this without downtime requires the same precautions you would take for a horizontally scaled system.

Vertical scaling is a viable strategy for moderate scaling requirements because it reaches a ceiling. Machines cannot become infinitely large, so horizontal scaling eventually becomes necessary.

To be updated with diagrams.