Vertical and Horizontal Scaling

Generally speaking, there are two ways to scale an application: vertically and horizontally. When I say "Scaling", I'm talking about increasing the capacity of an application. For example, maybe we have a web server, and to handle roughly 1000 requests per second, it uses about:

1/2 of a CPU core
1 GB of RAM

If we want to "scale up" to handle 2000 requests per second, we could double the CPU and RAM:

1 CPU core
2 GB of RAM

This is called "vertical scaling" because we're increasing the capacity of the application by increasing the resources available to it. We're scaling up. Scaling up works until it doesn't. You can only scale up as much as your hardware will allow (the maximum number of CPUs and amount of RAM your node has).

The other way to scale is horizontally. Instead of increasing the resources available to the application, we increase the number of instances of the application (pods). Pods can be distributed across nodes, so we can scale horizontally until we run out of nodes. When working in a system like Kubernetes, it's generally better to scale horizontally than vertically.