
In order to maximize resource utilization in massive cloud data centers, we present LAVA, a brand-new scheduling algorithm that continuously re-predicts and adjusts to the actual lifetimes of virtual machines. Imagine a Tetris-like puzzle game where the pieces fall quickly onto a stack. Some are just right. Some do not. Packing the blocks as tightly and effectively as possible is the objective.
This game serves as a loose analogy for the challenge that cloud data centers face several times per second as they attempt to allocate processing jobs (also known as VMs) in the most effective manner. However, in this instance, the “pieces”—also known as VMs—appear and vanish, with some lasting just a few minutes and others days. We still want to use as many physical servers as possible to maximize efficiency, despite the initially unknown lifespans of VMs. We would be able to allocate resources much more effectively if we knew the job’s approximate lifespan. Utilizing resources effectively is especially important when dealing with large data centers, both for economic and environmental reasons. Poor VM allocation can lead to “resource stranding”, where a server’s remaining resources are too small or unbalanced to host new VMs, effectively wasting capacity. The number of “empty hosts,” which are necessary for tasks like system updates and provisioning large, resource-intensive VMs, is also reduced by poor VM allocation. This lack of complete information regarding the behavior of VMs makes the classic bin packing problem even more difficult. By utilizing learned models to predict VM lifetimes, AI can assist with this issue. Nevertheless, this frequently relies on a single prediction made when the VM was created. The problem with this strategy is that a single mistake can slow down performance and tie up an entire host for a long time. We present a trio of algorithms, non-invasive lifetime aware scoring (NILAS), lifetime-aware VM allocation (LAVA), and lifetime-aware rescheduling (LARS), in “LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions.” These algorithms are intended to resolve the bin packing issue that arises when trying to fit virtual machines (VMs) onto physical servers in an effective manner. This system does not rely on the initial, one-time guess of a VM’s lifespan made at its creation because it uses a process we call “continuous reprediction.” Instead, as the VM continues to run, the model continuously and automatically adjusts its prediction for the expected remaining lifetime.
Probability distributions and predictions: VMs’ dark side The recognition that VM lifetimes frequently follow a long-tailed distribution and are frequently unpredictable is one of the key insights driving this research.
For instance, even though the vast majority of virtual machines (88 percent) only last for less than an hour, these short-lived VMs only use 2% of the total resources. As a result, the placement of a small number of long-lived virtual machines (VMs) has a disproportionately large effect on resource efficiency as a whole. We created an ML model that predicts a probability distribution for a VM’s lifetime rather than attempting to predict a single, average lifespan, which can be misleading for VMs with bimodal or highly varied lifespans. The model is able to capture the inherent uncertainty of a VM’s behavior thanks to this approach, which is influenced by survival analysis. More importantly, this distribution is used by our system to constantly update its predictions. Given that a virtual machine (VM) has been running for five days, how long is it expected to last? The system learns more information as a VM runs, which improves the accuracy of its lifetime prediction. Our algorithms are co-designed to make use of these predictions, actively correcting for errors and getting better over time.
A brand-new class of scheduling software We developed three novel algorithms to enhance VM allocation using this new, more robust prediction model.
1. Lifetime-Aware Non-Invasive Scheduling (NILAS)
Lifetime predictions are incorporated into an already-established scoring function by the non-invasive NILAS algorithm. Taking into account the predicted exit times of all existing VMs on that host, it ranks potential hosts for a new VM. NILAS aims to create more empty machines by prioritizing hosts where all VMs are expected to exit at the same time. NILAS is able to correct for errors because our use of predictions is less sensitive to prediction accuracy. Our large-scale cluster manager, Borg, has implemented the NILAS algorithm, which significantly enhances VM allocation.
2. VM Allocation with Lifetime Awareness (LAVA)
LAVA departs from the current scheduling mechanisms in a more fundamental way. LAVA places shorter-lived VMs on hosts with one or more long-lived VMs, whereas NILAS aims to pack VMs with similar lifetimes. The objective is to use short-lived virtual machines (VMs) to fill in resource gaps that are at least an order of magnitude shorter than the host’s anticipated lifespan, exiting quickly without increasing the host’s overall lifespan. In the event that a VM exceeds its anticipated expiration date, LAVA also actively responds to mispredictions by increasing the host’s anticipated lifespan. This strategy minimizes fragmentation and ensures that hosts are eventually freed up, according to simulations.
3. Rescheduling with Lifetime Awareness (LARS)
LARS minimizes VM disruptions during defragmentation and maintenance by utilizing our lifetime predictions. LARS migrates the VMs on a host that need to be defragmented first by sorting them by their predicted remaining lifetime. VMs with shorter lifespans naturally exit prior to migration. According to LARS simulations, it has the potential to cut the total number of migrations needed by about 4.5 percent. overcoming the obstacle of widespread deployment The development of effective algorithms and models is only one component of the solution. We had to rethink how we deploy models in order to get them to work reliably on a large scale. Serving machine learning models on dedicated inference servers is a common practice. However, as these servers would run on our cluster scheduling system, this would have created a circular dependency. It is unacceptable for a mission-critical system if a failure in the model serving layer leads to a cascading failure in the scheduler itself. The model was directly integrated into the Borg scheduler binary as our solution. This method ensured that the model was tested and implemented in the same rigorous manner as any other scheduler code change and eliminated the circular dependency. The model’s median latency is just 9 microseconds (s), which is 780 times faster than a comparable method that makes use of separate model servers. This also resulted in an additional benefit. When performing performance-sensitive tasks like maintenance and defragmentation, the model must have a low latency in order to be used. Additionally, we discovered that the number of required predictions could become a bottleneck in our largest zones. A host lifetime score cache, which only updates predictions when a VM is added or removed from a host or when a host’s expected lifetime expires, was introduced to address this issue. We are able to deploy our system across the entire fleet thanks to this caching mechanism, which ensures high performance.
Results
Since the beginning of 2024, Google’s production data centers have been using our NILAS algorithm. The outcomes are distinct and significant. Empty hosts have increased by 2.3–9.2 percentage points (pp) in both our fleet-wide rollouts and production pilots. Since a 1 pp improvement typically results in a savings of 1 percent of a cluster’s capacity, this metric is directly correlated with efficiency. Reduced resource stranding: In a few pilot tests, NILAS reduced memory stranding by 2% and CPU stranding by about 3%, respectively. This means that new VMs can use more of the host’s resources. LAVA appears to offer an additional improvement of 0.4 pp over NILAS in simulations. Similarly, LARS simulations suggest that it may be able to cut down on the number of VM live migrations required for maintenance by 4.5 percent.
Conclusion
We believe that this work is a fundamental step toward the day when machine learning systems will increasingly optimize data center management. We developed methods that can be used for a variety of tasks, including co-designing models and systems and using predictions. We have demonstrated that advanced machine learning techniques can be integrated into the infrastructure stack’s lower layers without sacrificing reliability or latency and still deliver significant efficiency gains.