VMware ESXi Memory Management Reimagined: NVMe Tiering vs Traditional Swap

Table of Contents

Introduction

This research article explores two primary strategies for memory management in a VMware ESXi-based VDI (Virtual Desktop Infrastructure) environment: swapping and the newly introduced NVMe memory tiering. NVMe tiering enables the use of high-speed, NVMe storage as an extension of system memory, offering a new approach to scaling virtual machine workloads and could be a cost effective alternative to physical DRAM. This study evaluates the performance and resource implications of each method, with a particular focus on how NVMe memory tiering compares to traditional approaches under real-world VDI conditions.

Understanding NVMe Memory Tiering

Virtual Desktop Infrastructure (VDI) places unusually tight performance and user experience requirements on the underlying virtualization platform. Unlike many server workloads, interactive desktops are sensitive to latency spikes, inconsistent response times, and jitter introduced by resource contention. For this reason, some well-known guidance for ensuring performance in the Desktop Virtualization ecosystems is often to reserve 100% of the configured virtual machine memory in the use case of a full desktop.

Full reservation ensures that each desktop’s memory pages remain resident in host DRAM and are not subject to reclamation techniques such as ballooning, compression, or hypervisor swapping. You can read more about some of those techniques here. These reclamation techniques will introduce a performance penalty, in VDI-scenario’s when physical DRAM is exhausted.

The full memory reservation approach delivers predictable performance but at a high capital cost: DRAM has to be sized for peak concurrent sessions, and the reservation policy limits resource consolidation ratios.

Over time, this approach has been debated. Instead of locking all guest memory, administrators could deploy VDI pools without memory reservations and rely on ESXi’s memory management stack, transparent page sharing (TPS), ballooning, memory compression, swap to host cache (SSD), and ultimately regular swapping, to safely overcommit physical memory.

This strategy improves density and lowers cost per desktop, but it introduces performance and stability risks. Once the memory of a physical host is exhausted, reclamation techniques such as compression or swapping are used. When memory reclamation is active, users can have degraded user experience, like: delays in application launches and degraded multimedia playback, also login storms can take longer. Balancing cost efficiency against end-user experience becomes an ongoing design tradeoff.

A New Variable: Memory Tiering over NVMe

NVMe memory tiering is available for production loads in VMware Cloud Foundation (VCF) 9. Omnissa Horizon 2506 is the first version of Horizon to support VCF and VMware vSphere Foundation (VVF) 9.0. More information about the VCF/VVF Bundles can be found here: VMware by Broadcom Dramatically Simplifies Offer Lineup and Licensing Model - Broadcom News and Stories

vSphere 8.0 Update 3 introduced Memory Tiering over NVMe as a Technical Preview capability. With memory tiering enabled, ESXi can present locally attached NVMe devices as an auxiliary (slower) memory tier that extends the effective memory footprint of a host beyond installed DRAM.

The hypervisor places hotter, latency-critical pages preferentially in DRAM while migrating colder pages to the NVMe tier. In principle, this could narrow the historical gap between “all memory reserved in DRAM” and “unreserved memory subject to swap” by offering a middle ground in which excess pages spill to high performance flash rather than to traditional diskbacked swap files.

In principle, any NVMe device can be used as long as it meets the required endurance, provides low latency and high IOPS, has sufficient capacity for memory tiering workloads, and is listed on the VMware Hardware Compatibility List (HCL) for production use. The NVMe device should deliver at least 100,000 writes per second, have an endurance rating of at least 3 Drive Writes Per Day (DWPD).

Because the feature remains in technical preview, production support, full workload qualification, and automated tuning heuristics are still maturing. Nevertheless, the potential implications for VDI are significant. Potentially; higher desktop density per host without purchasing equivalent DRAM capacity, reduced performance penalty under memory pressure compared with conventional swapping to slower storage, improved cost per concurrent user when DRAM pricing or server DIMM slot limits constrain scaling and new operational considerations (device endurance, tier sizing ratios, monitoring metrics, failure semantics) that differ from classic swapfile behavior.

Problem Statement

Organizations operating large VDI environments face mounting pressure to increase user density and reduce infrastructure cost while maintaining, if not improving, perceived desktop responsiveness.

Traditional design choices often result in (but not limited to) a decision between fully reserving DRAM (costly but predictable) and accepting the risks of overcommitment (efficient but potentially erratic). The emergence of Memory Tiering over NVMe suggests a third path, yet data quantifying its effectiveness for realworld VDI workloads are scarce.

Research Objectives

This study seeks to generate evidence that NVMe tiering could benefit VDI deployments in or even improve density without compromising the end user experience on a host.

Specifically, the study will measure host-level efficiency metrics across two memory provisioning strategies, including:

  • VM density
  • Memory reclamation behavior
  • CPU overhead (e.g., CPU usage, CPU latency) associated with page movement between tiers

Both provisioning strategies adhere to established CPU sizing best practices, limiting the configuration to a 1 pCPU : 5 vCPU ratio, excluding considerations for Hyperthreading.

The two memory provisioning strategies are:

  1. Conventional Overcommitment (No Reservation):
    • 256 GB RAM available on host
    • 560 GB RAM provisioned to VMs
  2. NVMe Memory Tiering Enabled:
    • 256 GB RAM + 700 GB NVMe tier = 956 GB total memory
    • 560 GB RAM provisioned to VMs

This setup allows for a direct comparison of how NVMe Tiering affects memory efficiency, CPU overhead, and overall host performance relative to traditional swapping.

Environment Configuration

Omnissa Horizon Instant Clone pools running Windows 11 Enterprise 23H2 single-session desktops will serve as the representative VDI workload. The testing environment will simulate realistic concurrency patterns such as, login storms, steady state knowledge worker activity, and memory utilization driven by common enterprise applications. Little breaks are added to the workload to generate cold pages.

When physical RAM is exhausted, less frequently accessed (cold) pages are evicted from DRAM. Traditionally, these pages are written to disk-based swap files (secondary storage). Cold pages refer to memory regions that are infrequently accessed, while hot pages are frequently used and remain in DRAM for fast access.

The NVMe device will be dedicated to memory tiering to isolate performance characteristics and as the feature is in technical preview, results will be framed as directional rather than prescriptive for production. The standard GO-EUC testing methodology was used to gather these results, more information on this subject can be found on our website.

Hardware used in this research

  • Cisco UCSC-C240-M4S2
  • 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz 14 cores (28 Cores total), 28 threads (56 threads total)
  • 256 GB RAM
  • 16 GB RAM reserved for ESXi
  • 2x SATA SSD 960GB 6 Gbps used as local datastores
  • Intel 750GB Optane SSD DC P4800X for NVMe memory tiering (700 GiB usable in ESXi)
  • vSphere 8.0 U3

Sizing

  • Transparent Page Sharing (TPS) disabled
  • 70 x VDI Windows 11 enterprise Horizon instant clones. Smart provisioning MODE B (no parent VM)
    • 2 vCPU and 8 GB RAM assigned to each VDI
  • Omnissa Horizon 8 2503
  • Windows OS Optimization Tool for Omnissa Horizon 2503
    • Default template and in addition disabled Firewall, Antivirus, Security Center and block OneDrive Usage
  • No profile management configured
  • Basic desktop management based on DEM (Dynamic Environment Manager)
  • No vTPM
  • All VDI’s provisioned up front to make the machines settle before load. A new test is started when the CPU starts to calm down after the provisioning operation.

Test Configuration

  • 70 Omnissa Horizon Instant Clones Virtual Desktops
  • 70 login sessions in a 35 minute time frame (2 per minute)
  • 30 minutes test duration
  • No memory reservation
  • 560 GB RAM provisioned
  • 140 vCPU provisioned on 28 pCPU is ratio 5:1
  • NVMe Tiering Enabled

VMware Performance Counters

Metric Scope Type Description High Value Suggests Availability
Average Host CPU Host CPU % CPU utilization Host CPU saturation Always
Host Memory Usage Host Memory % of ESXi host physical memory in use Memory pressure on host Always
VM CPU Latency VM CPU Time VM waits for CPU scheduling Host CPU contention Always
Memory Granted (avg) VM Memory Total memory allocated to the VM May be overprovisioned if much higher than Active Always
Memory Active (avg) VM Memory RAM recently accessed by the VM – actual working memory set VM is actively using that memory Always
VM Memory Latency (avg) VM Memory Time to access memory by the VM Contention, tiered memory effects, or swapping Always
vmmemctl_average VM Memory Memory reclaimed via balloon driver (vmmemctl) also known as ballooning Moderate memory pressure on host Always (VMware Tools required)
Host Swap Rate Out (avg) VM Memory Rate of pages moved from RAM to disk Host is under memory stress – swapping starts Always
Host Swap Rate In (avg) VM Memory Rate of pages swapped from disk back to RAM Memory was previously swapped out, now needed again Always

Tiered Memory (Only with NVMe Memory Tiering Enabled)

Metric Scope Type Description High Value Suggests Availability
VM Tier 0 RAM VM Tiered Memory Memory pages placed in DRAM (fastest memory) Memory resides in optimal location Only with NVMe Memory Tiering
VM Tier 1 NVMe RAM VM Tiered Memory Memory pages placed in NVMe-backed memory (slower than DRAM) Cold pages offloaded to NVMe Only with NVMe Memory Tiering

Hypothesis and Results

We evaluated two memory management scenarios to understand how NVMe Memory Tiering affects host memory usage and system performance. In the first scenario, NVMe Memory Tiering was enabled, allowing cold memory pages to be offloaded to the NVMe tier. In the second scenario, tiering was disabled, and the system relied entirely on traditional disk-based swapping.

The general expectation is that NVMe Memory Tiering will outperform standard swapping because NVMe storage is significantly faster than traditional SAS or SSD-based storage. As a result, NVMe Tiering is expected to provide a more responsive experience for VDI users while offering a cost-effective solution for environments constrained by DRAM capacity.

Host Level Metrics

In the NVMe Memory Tiering scenario, CPU usage is higher, indicating increased load on CPU resources compared to the swapping-only scenario.

In the NVMe Memory Tiering scenario, active memory is lower and swap-in/out activity is reduced, reflecting offloading of cold pages to the NVMe tier. In contrast, the swapping-only scenario shows higher active memory along with significantly increased swap activity.

Host Memory Swap Rate

In the NVMe Memory Tiering scenario, some swapping still occurs, primarily during periods of high memory demand such as login storms. In comparison, the swapping-only scenario shows a higher overall swapping rate.

VM Level Metrics

In the NVMe Memory Tiering scenario, CPU latency is higher, indicating increased CPU load compared to the non-tiered configuration.

VM Memory Granted

In the NVMe Memory Tiering scenario, the average memory granted counter is higher compared to the swapping-only scenario. This suggests that the system can allocate more memory to VMs, potentially due to more efficient memory utilization enabled by NVMe Memory Tiering.

VM Memory Active

In the NVMe Memory Tiering scenario, the average VM Memory Activer counter is lower compared to the swapping-only scenario, indicating that less DRAM is actively used by VMs due to the offloading of cold pages to the NVMe tier.

VM vmmemctl (Ballooning)

In the NVMe Memory Tiering scenario, ballooning still occurs but at a lower level than in the swapping-only scenario, particularly during login storms, as cold pages are offloaded to the NVMe tier.

VM Memory Latency Average

In the NVMe Memory Tiering scenario, memory latency is lower, whereas the swapping-only scenario exhibits higher latency due to disk-based swapping of cold pages.

NVMe Memory Tiering Metrics

The following per VM metrics are only available when NVMe tiering is enabled, so they only apply to the NVMe Tiering Scenario.

VM Tier 0 RAM

VM Tier 1 NVMe RAM

In the NVMe Memory Tiering scenario, once user sessions settle, cold memory pages are migrated to the NVMe tier for more efficient memory management.

Conclusion

This research demonstrates that enabling NVMe Memory Tiering in a VMware ESXi-based VDI environment introduces both benefits and trade-offs.

On the one hand, NVMe Tiering increases CPU overhead because the hypervisor must manage memory across two tiers: DRAM and NVMe. This results in a 10 to 20% higher CPU usage compared to traditional swap-only configurations. CPU latency was observed to be lower when using conventional swapping, likely due to the simpler memory management involved in standard swap operations.

On the other hand, systems without NVMe Tiering showed significantly higher swap activity. Without an intermediate memory tier, cold memory pages are moved to disk-based swap. NVMe Tiering mitigates this by offloading less frequently accessed pages to fast NVMe storage before traditional swapping becomes necessary. Although ballooning and some initial swapping still occurred with NVMe Tiering enabled, the total amount of swap was reduced after workloads stabilized. This demonstrates that tiering can effectively reduce memory pressure under typical VDI conditions.

A key finding of this research is that memory latency was consistently lower when NVMe Tiering was used. While NVMe is slower than DRAM, it is considerably faster than traditional swap locations such as SSDs or hard disks. As a result, the memory subsystem was able to respond faster when under pressure, leading to better overall desktop responsiveness.

This study was conducted with Transparent Page Sharing (TPS) disabled, which is the default setting in modern ESXi deployments. With TPS unavailable, memory reclamation begins with ballooning and compression. When NVMe Tiering is enabled, cold memory pages are staged to NVMe rather than being swapped out immediately. However, ballooning and swapping still occur, especially during peak load or boot storms. Swapping is not a last-resort mechanism that only activates after tiering is exhausted. It can operate concurrently with NVMe Tiering, particularly when hot memory pages cannot be served quickly enough from DRAM or NVMe. This behavior confirms that all memory management techniques remain active and complementary when tiering is in use.

An important operational insight is that full memory reservation is not strictly required, regardless of whether NVMe Tiering is enabled. Both traditional swap and tiering-based memory management work without reserving 100 percent of virtual machine memory. This allows for greater memory overcommitment and higher virtual desktop density, although each approach carries different trade-offs. Conventional swapping reduces CPU load but increases memory latency. NVMe Tiering improves memory responsiveness but consumes additional CPU resources.

In summary, NVMe Memory Tiering is not a full substitute for DRAM in environments with consistently high memory demands. However, for VDI workloads with bursty or variable usage patterns, it could provide a practical and efficient balance between performance and scalability. It reduces memory latency, limits swap usage, and enables higher desktop density, all without requiring full memory reservation. This makes NVMe Tiering a valuable feature to consider in modern virtual desktop environments where both user experience and infrastructure efficiency must be carefully balanced.

Closing Note

This research was conducted on VMware vSphere 8.0 Update 3, where NVMe Memory Tiering was available only in technical preview. With the release of vSphere 9.0, the feature has reached general availability. We are looking forward to seeing how it performs at scale in real production environments.

Photo by Conny Schneider on Unsplash