We talk about scalability, and scaling laws. But what does scalability actually mean? Let’s explore the idea of scalability, starting with the example of maps. Maps have a unique quality—they demonstrate what I call ‘perfect scalability.’ When you create a map, the effort required to draw it is independent of the size of the area being represented. Whether you’re mapping a small city or an entire country, you decide the scale, and the process remains essentially the same. The density of information on the map is uniform, regardless of the size of the real-world object. Of course you will give up on details as you increase the area you represent. This perfect scalability demonstrates how systems can adapt their representation to a fixed level of effort while scaling across vast dimensions.
While maps illustrate perfect scalability, most real-world systems are scalable only in certain ways. We can see what this means through the example of physical buildings. Our ability to build houses scales well horizontally. If you want to build 100 houses, it requires roughly 10 times the effort of building 10 houses. This is a linear scaling across the horizontal dimension. However, vertical scalability, building taller structures, doesn’t behave the same way.
Going from a single-story house to a 10-story building introduces nonlinear complexities like structural reinforcement, new materials, and safety systems. The cost and the resources needed will not be 10 times more, but many times more. Again, if we want to build a 100-story skyscraper, it will be much more expensive than 10 buildings that are 10-story high.
And despite our ingenuity, there are hard limits. For instance, we can’t build a skyscraper 1,000 stories tall, no matter how much effort we apply. This example illustrates how scalability depends on the specific dimension being scaled and how each dimension has its own unique constraints and limits.
In modern AI systems scalability plays a critical role in achieving performance. The capabilities of Large language models are shaped by scalability laws. As we scale the size of these models, by increasing the number of parameters, compute, data, and training time, their performance improves. Cascading methods such as pre-training, fine-tuning, and test-time adjustments extend the envelope of performance, allowing us to overcome individual limitations. By strategically scaling in multiple dimensions, we push AI systems beyond their initial constraints, optimizing performance across the entire lifecycle of the model.
One of the most intriguing aspects of scalability in AI is the emergence of unexpected capabilities. As we scale large language models, their performance can undergo non-linear jumps. For example, tasks like logical reasoning, translation, or even coding emerge at certain thresholds, even though they weren’t explicitly programmed into the system. This is similar to phase transitions in physics, where a system suddenly changes behavior once a critical point is reached. These latent features surprise many people because they’re not directly tied to the original design goals but are instead byproducts of the model’s scale and depth. These emergent capabilities challenge our assumptions about how systems achieve complex behavior.