AWS Logo
Menu

Best Practices for configuring the JVM in a Kubernetes environment

Learn the ins and outs of JVM and K8S resource management, and how to best configure your container to maximize performance and minimize disruptions

Camille
Amazon Employee
Published Jun 10, 2025

Introduction

The Java Virtual Machine (JVM) is a crucial component of many java-based enterprise applications, and its configuration can significantly impact the performance, reliability, and scalability of those applications, especially when deployed in a Kubernetes (K8S) cluster. This post aims at providing a set of best practices for configuring the JVM in a K8S environment, helping you make informed decisions about JVM parameters and ensure optimal application performance.
The recommendations in this post are based on a combination of industry best practices, real-world experience, and lessons learned from deploying and managing Java applications in Kubernetes clusters. I will cover key configuration areas such as memory management, garbage collection, JVM options, and container resources, and provides practical guidance on how to configure and tune these settings for various application workloads and cluster environments. By following these best practices, you can improve the efficiency, stability, and responsiveness of your Java applications running in K8S, ultimately leading to better user experience and reduced operational overhead.
Before diving in these recommendations, we will take a brief look at how modern versions of the JVM manage memory and garbage collection (GC), how K8S schedule and allocate resources to pods, how the JVM behaves in a container environment and what are the risks of misconfiguration. If you want to skip right to the recommendations, scroll down to the last section.
I encourage you to take a look at our EKS Best Practices guide for more insights into other areas, and for what is generally a must read!

A brief look at the Java Virtual Machine (JVM)

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.
JVM workings

Memory Management

Objective: avoid getting out of memory (OOM) exceptions

One common indication of a memory leak is the java.lang.OutOfMemoryError error. This error indicates that the garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further. This error may also be thrown when there is insufficient native memory to support the loading of a Java class. In rare instances, the error is thrown when an excessive amount of time is being spent performing garbage collection, and little memory is being freed.
More details here.

Overview of JVM Memory spaces

JVM memory spaces
Heap memory
The heap space is where objects created by Java applications are stored. It is the most important memory space for Java applications. Starting with OpenJDK 17, the default heap size is calculated based on the available physical memory and is set to 1/4th of the memory available. The heap space is itself divided in two “generations”:
  • The Young Generation is where new objects are created, and short-lived ones are stored. It is divided into two spaces: the Eden space and the Survivor space. The first one is where new objects are created, while the second one is where objects that have survived one garbage collection cycle are moved.
  • The Old Generation, also known as the tenured generation, stores long-lived objects. Those that have survived some garbage collection cycles are moved from the young to the old generation.
Metaspace
The non-heap space is used by the JVM to store metadata and class definitions. It is also known as the permanent generation (PermGen) in older versions of Java. Starting with OpenJDK 17, the PermGen space has been replaced by a new Metaspace, which is designed to be more efficient and flexible.
Code Cache
The code cache is used by the JVM to store compiled code generated by the Just-In-Time (JIT) compiler.
Thread Stack Space
Each thread in a Java application has its own stack space, which is used to store local variables and method calls.
Shared libs
The shared libraries space (also known as the shared class data space) is a memory space used to store shared class metadata and other data structures. This memory space is shared across multiple Java processes. This allows various Java applications running on the same machine to share the same copy of the class metadata and other data structures.
The purpose of the shared libraries space is to reduce memory usage and improve performance by avoiding duplicate copies of the same class metadata. When multiple Java processes use the same class metadata, they can share the same copy of that metadata, which reduces memory usage and improves startup times for the applications.

Fine grain memory spaces configuration

JVM memory spaces configuration

CPU Management

CPU resources are often less of a bottleneck than memory for a Java application, and there is no simple way of limiting the CPU resources an application can utilize, unless running in a container (which I will touch upon in the best practices section).
More recent versions of the JDK have introduced a new parameter, -XX:ActiveProcessorCount, that can limit the number of CPUs the JVM detects on startup and will influence the threads count, but will not apply a hard limit on the number of CPU cores the JVM will actually use (which is especially important at startup, as you want the JVM to start as fast as possible using all available cores).

Garbage Collection Management

The garbage collector (GC) automatically manages the application's dynamic memory allocation requests. A garbage collector performs automatic dynamic memory management through the following operations:
  • Allocates from and gives back memory to the operating system.
  • Hands out that memory to the application as it requests it.
  • Determines which parts of that memory is still in use by the application.
  • Reclaims the unused memory for reuse by the application.
The Java HotSpot garbage collectors employ various techniques to improve the efficiency of these operations:
  • Use generational scavenging in conjunction with aging to concentrate their efforts on areas in the heap that most likely contain a lot of reclaimable memory areas.
  • Use multiple threads to aggressively make operations parallel, or perform some long-running operations in the background concurrent to the application.
  • Try to recover larger contiguous free memory by compacting live objects.
There are a variety of collectors available to use:

How the JVM select which GC to use

JVM GC decision tree

Objective: select the proper GC for your use case

Each GC has its own characteristics. As a general rule of thumb, let the VM select the GC. If the performance doesn’t meet your requirements, some high-level guidelines to follow:
  • If the application has a small data set (up to approximately 100 MB), then select the serial collector with the option -XX:+UseSerialGC.
  • If the application will be run on a single processor and there are no pause-time requirements, then select the serial collector with the option -XX:+UseSerialGC.
  • If (a) peak application performance is the first priority and (b) there are no pause-time requirements or pauses of one second or longer are acceptable, then let the VM select the collector or select the parallel collector with -XX:+UseParallelGC.
  • If response time is more important than overall throughput and garbage collection pauses must be kept shorter, then select the mostly concurrent collector with -XX:+UseG1GC.
  • If response time is the highest priority, then select a fully concurrent collector with -XX:UseZGC.
Further reading:

A brief look at K8S resources allocation

There are many great pieces that explain in details how this works, and I encourage you to go through them. For the purpose of this document I will take a 10,000 feet view of how K8S allocates and reserves resources for containers.
K8S allows users to configure the minimum amount of resources a container needs to run through requests (both memory and cpu) and the maximum amount of resources a pod is allowed to use through limits (both cpu and memory).
  • Limits and requests for cpu resources are measured in cpu units. In Kubernetes, 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine. Fractional requests are allowed. When you define a container with requests.cpu set to 0.5, you are requesting half as much CPU time compared to if you asked for 1.0 CPU. For CPU resource units, the quantity expression 0.1 is equivalent to the expression 100m, which can be read as "one hundred millicpu". Some people say "one hundred millicores", and this is understood to mean the same thing.
  • Limits and requests for memory are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value: 128974848, 129e6, 129M, 128974848000m, 123Mi.
If a container ends up consuming more resources than the limit:
  • For memory it means the container will be killed by Kubernetes OOMKiller
  • For cpu it means the container will be throttled as it will not have access to enough cpu shares but it will not be killed
A typical deployment file would look like this:
Depending on the configuration, Kubernetes will assign one of three Quality of Service (QoS) class to the pod, based on the diagram below:
QoS classes
This QoS class will be used when determining which pods have to be killed/evicted first when the node comes under either cpu or memory pressure (i.e., the pods are consuming too much of the node resources).
  • BestEffort pods are given the lowest priority for access to the cluster’s resources and may be terminated if other pods require the resources.
  • Burstable pods can temporarily use more resources than they requested if the resources are available, but the cluster will not guarantee these additional resources. These pods would be next in line to be killed if there are no BestEffort pods and they exceed requests.
  • The cluster will ensure that Guaranteed pods have access to the requested resources at all times. They are guaranteed not to be killed until they exceed their limits or there are no lower-priority Pods that can be preempted from the Node.
Configuring pods running JVM applications as BestEffort or Burstable can lead to issues that I will touch on in a later section.
Certain behavior is independent of the QoS class assigned by Kubernetes. For example:
  • Any container exceeding a resource limit will be killed and restarted by the kubelet without affecting other Containers in that Pod.
  • If a container exceeds its resource request and the node it runs on faces resource pressure, the Pod it is in becomes a candidate for eviction. If this occurs, all Containers in the Pod will be terminated. Kubernetes may create a replacement Pod, usually on a different node.
  • The resource request of a Pod is equal to the sum of the resource requests of its component containers, and the resource limit of a Pod is equal to the sum of the resource limits of its component containers.
  • The kube-scheduler does not consider QoS class when selecting which Pods to preempt. Preemption can occur when a cluster does not have enough resources to run all the Pods you defined.
Further reading:

The JVM in a containerized environment

Since Java 11 (and now back-ported to Java 8 8u372), the JVM has been aware that it is running inside a container. The container awareness detection uses Linux's control group (cgroup) filesystem to detect enforced resource quotas. As of this writing, the most recent versions of long-term support releases (8, 11, 17 and 21) support both cgroups v1 and cgroups v2 configurations.
As you have seen, K8S lets deployments limit container resources via CPU and memory quotas. Those limits translate into options that are passed to the container engine when containers are deployed. Container engine options, in turn, set resource limits via the Linux cgroup pseudo-filesystem. The Linux kernel ensures that when resource limits are in place via the cgroup, no process goes beyond those limits (at least not for extended periods of time). When Java processes are deployed in such an environment, cgroup limits might be set for the deployed process. If the Java Virtual Machine does not take configured cgroup limits into account, it might risk trying to consume more resources than the operating system is willing to provide to it. The result could be the unexpected termination of the Java process.

What can go wrong?

For production deployments, you want to avoid the following situations:
  • Getting java.lang.OutOfMemoryError exceptions, followed by the process crashing because the JVM does not have access to enough memory;
  • Getting a pod killed because one or more of its containers ended up using more than its memory limit, as illustrated below:
JVM getting out of memory
  • Getting a pod killed/evicted because the node it’s hosted on gets under memory pressure and one or more of its containers are of the BestEffort or Burstable QoS class.
As you have seen so far, avoiding (or minimizing) these occurrences means you need to properly configure how much memory is allocated to the JVM and how containers memory requests and limits are defined.

Best Practices and Recommendations

Memory

Reminders

With a few exceptions, in the vast majority of cases, when you don’t specify the desired heap size using the -Xmx parameter or the -XX:MaxRAMPercentage flag, Ergonomics ends up configuring the maximum heap value as ¼ of the available memory, which is generally not adequate for a container environment. Following exceptions apply:
  • If the container has up to 256MB of available memory, the maximum heap value will be 50%.
  • If the container has between 256MB and 512MB of available memory, the maximum heap value will be approximately 127MB.
Parameters such as Xmx and MaxRAMPercentage only affect Heap memory, and if set too close to the container limit may impact other memory spaces you have seen in an earlier section such as
  • Non-Heap Memory: A high setting for -Xmx can cause the non-heap memory to be exhausted, leading to OutOfMemoryError exceptions and application crashes.
  • Metaspace Memory: The metaspace size is not limited by default, but it can be configured using the -XX:MaxMetaspaceSize flag. The rule of thumb here is to limit the metaspace size to a value 16 times lower than the available memory. So, for example, if the JVM has 4GB of available memory, -XX:MaxMetaspaceSize=256M would be correct.
  • RAM-based file systems, such as tmpfs, can influence the -Xmx configuration too. For example, if the Pod has 4GB of available memory and a 1GB tmpfs mount, a common setting for -Xmx might be -Xmx3G. This ensures that the JVM has enough memory to run correctly, even with the additional memory requirements of the tmpfs mount.

Recommendations

Ideally, use load testing to find the maximum memory required across different spaces such as Heap, Metaspace and ReservedCodeCache and backwardly compute the memory limit from these requirements
Precise formula: limit = ((XmX * 2) + MaxMetaspaceSize + ReservedCodeCacheSize)/0.75
  • The Metaspace stores metadata and class definitions, so normally 256MiB is enough
  • The CodeCache stores compiled code generated by the JIT, so normally 256MiB is enough
  • The 0.75 denominator and 2x multiplier give your component some breathing room
Example for a “small” component requiring 2GiB of heap memory: MEM Limit = (4GiB + 256MiB + 256MiB)/0.75 = 6GiB
If you don’t want to bother with MaxMetaspaceSize and ReservedCodeCacheSize, you can let the JDK define default values, and instead set the memory limit to XmX * 3 (on smaller values of Xmx) down to XmX * 1.5 (on bigger values of Xmx). For instance:
  • For a “small” component requiring 2GiB of heap memory, you can set the limit to 6GiB
  • But for a “bigger” component requiring 8GiB of heap memory, 12GiB should be sufficient
Otherwise, start with a reasonable memory limit based on your experience with your application and set -XX:MaxRAMPercentage to 60%. Adjust these values based on your application metrics (e.g., OutOfMemory exceptions, free memory)
For production deployments, define requests = limits to leverage Guaranteed QoS class. For non-production deployments, you can set requests < limits but do note that the bigger the difference between request and limit, the more likely you are to incur resources contention with other pods.
To avoid the JVM having to deal with memory allocation and deallocation tasks, use the value of Xms equal to the value of Xmx.

Garbage Collection

Reminders

Avoid using SerialGC in high-concurrency server-side environments by ensuring that the JVM is not limited to only 1 available CPU. There are several ways to achieve this, such as adjusting the container’s CPU limit to above 1001m or using the -XX:ActiveProcessorCount flag with values greater than 1.
Be aware that if your container has less than 1792MB of available memory and you don’t force a specific GC version, Ergonomics will also select SerialGC.
You can also specify the desired GC implementation through Java arguments such as -XX:+UseG1GC or -XX:+UseParallelGC.

Recommendations

As a rule of thumb, use ParallelGC for heaps up to 4GiB and G1 for heaps above 4GiB. Although there are more GC implementations available, this should cover most use cases.

CPU

Reminders

The JVM interprets 1000m as 1 available CPU, 1001m as 2 CPUs, 2001 as 3, and so on.
If a container starts exceeding CPU limits, Kubernetes will begin throttling the container, meaning it will limit CPU usage, which may result in a performance drop for the application, though it’s important to note that the container will not be terminated or removed. Therefore, there is no definitive guidance on whether setting limits for CPU resources is required or not.

Recommendations

Do not specify resource limits on CPU. In the absence of limits, the request acts as a weight on how much relative CPU time containers get. This allows your workloads to use the full CPU without an artificial limit or starvation.
Unless you are low on resources, define requests >= 1001m for every containers
Set the number of CPUs visible by Java with the flag -XX:ActiveProcessorCount to ~2x the number of your container CPU requests

Tooling

Several tools are available to help you adjust the value of requests as your application evolves:

Sources

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments