Best Practices for configuring the JVM in a Kubernetes environment
Learn the ins and outs of JVM and K8S resource management, and how to best configure your container to maximize performance and minimize disruptions
Camille
Amazon Employee
Published Jun 10, 2025
The Java Virtual Machine (JVM) is a crucial component of many java-based enterprise applications, and its configuration can significantly impact the performance, reliability, and scalability of those applications, especially when deployed in a Kubernetes (K8S) cluster. This post aims at providing a set of best practices for configuring the JVM in a K8S environment, helping you make informed decisions about JVM parameters and ensure optimal application performance.
The recommendations in this post are based on a combination of industry best practices, real-world experience, and lessons learned from deploying and managing Java applications in Kubernetes clusters. I will cover key configuration areas such as memory management, garbage collection, JVM options, and container resources, and provides practical guidance on how to configure and tune these settings for various application workloads and cluster environments. By following these best practices, you can improve the efficiency, stability, and responsiveness of your Java applications running in K8S, ultimately leading to better user experience and reduced operational overhead.
Before diving in these recommendations, we will take a brief look at how modern versions of the JVM manage memory and garbage collection (GC), how K8S schedule and allocate resources to pods, how the JVM behaves in a container environment and what are the risks of misconfiguration. If you want to skip right to the recommendations, scroll down to the last section.
I encourage you to take a look at our EKS Best Practices guide for more insights into other areas, and for what is generally a must read!
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

One common indication of a memory leak is the
java.lang.OutOfMemoryError
error. This error indicates that the garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further. This error may also be thrown when there is insufficient native memory to support the loading of a Java class. In rare instances, the error is thrown when an excessive amount of time is being spent performing garbage collection, and little memory is being freed.More details here.

The heap space is where objects created by Java applications are stored. It is the most important memory space for Java applications. Starting with OpenJDK 17, the default heap size is calculated based on the available physical memory and is set to 1/4th of the memory available. The heap space is itself divided in two “generations”:
- The Young Generation is where new objects are created, and short-lived ones are stored. It is divided into two spaces: the Eden space and the Survivor space. The first one is where new objects are created, while the second one is where objects that have survived one garbage collection cycle are moved.
- The Old Generation, also known as the tenured generation, stores long-lived objects. Those that have survived some garbage collection cycles are moved from the young to the old generation.
The non-heap space is used by the JVM to store metadata and class definitions. It is also known as the permanent generation (
PermGen
) in older versions of Java. Starting with OpenJDK 17, the PermGen
space has been replaced by a new Metaspace
, which is designed to be more efficient and flexible.The code cache is used by the JVM to store compiled code generated by the Just-In-Time (JIT) compiler.
Each thread in a Java application has its own stack space, which is used to store local variables and method calls.
The shared libraries space (also known as the shared class data space) is a memory space used to store shared class metadata and other data structures. This memory space is shared across multiple Java processes. This allows various Java applications running on the same machine to share the same copy of the class metadata and other data structures.
The purpose of the shared libraries space is to reduce memory usage and improve performance by avoiding duplicate copies of the same class metadata. When multiple Java processes use the same class metadata, they can share the same copy of that metadata, which reduces memory usage and improves startup times for the applications.

CPU resources are often less of a bottleneck than memory for a Java application, and there is no simple way of limiting the CPU resources an application can utilize, unless running in a container (which I will touch upon in the best practices section).
More recent versions of the JDK have introduced a new parameter,
-XX:ActiveProcessorCount
, that can limit the number of CPUs the JVM detects on startup and will influence the threads count, but will not apply a hard limit on the number of CPU cores the JVM will actually use (which is especially important at startup, as you want the JVM to start as fast as possible using all available cores).The garbage collector (GC) automatically manages the application's dynamic memory allocation requests. A garbage collector performs automatic dynamic memory management through the following operations:
- Allocates from and gives back memory to the operating system.
- Hands out that memory to the application as it requests it.
- Determines which parts of that memory is still in use by the application.
- Reclaims the unused memory for reuse by the application.
The Java HotSpot garbage collectors employ various techniques to improve the efficiency of these operations:
- Use generational scavenging in conjunction with aging to concentrate their efforts on areas in the heap that most likely contain a lot of reclaimable memory areas.
- Use multiple threads to aggressively make operations parallel, or perform some long-running operations in the background concurrent to the application.
- Try to recover larger contiguous free memory by compacting live objects.
There are a variety of collectors available to use:

Each GC has its own characteristics. As a general rule of thumb, let the VM select the GC. If the performance doesn’t meet your requirements, some high-level guidelines to follow:
- If the application has a small data set (up to approximately 100 MB), then select the serial collector with the option
-XX:+UseSerialGC
. - If the application will be run on a single processor and there are no pause-time requirements, then select the serial collector with the option
-XX:+UseSerialGC
. - If (a) peak application performance is the first priority and (b) there are no pause-time requirements or pauses of one second or longer are acceptable, then let the VM select the collector or select the parallel collector with
-XX:+UseParallelGC
. - If response time is more important than overall throughput and garbage collection pauses must be kept shorter, then select the mostly concurrent collector with
-XX:+UseG1GC
. - If response time is the highest priority, then select a fully concurrent collector with
-XX:UseZGC
.
Further reading:
There are many great pieces that explain in details how this works, and I encourage you to go through them. For the purpose of this document I will take a 10,000 feet view of how K8S allocates and reserves resources for containers.
K8S allows users to configure the minimum amount of resources a container needs to run through
requests
(both memory and cpu) and the maximum amount of resources a pod is allowed to use through limits
(both cpu and memory).- Limits and requests for
cpu
resources are measured in cpu units. In Kubernetes, 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine. Fractional requests are allowed. When you define a container withrequests.cpu
set to0.5
, you are requesting half as much CPU time compared to if you asked for1.0
CPU. For CPU resource units, the quantity expression0.1
is equivalent to the expression100m
, which can be read as "one hundred millicpu". Some people say "one hundred millicores", and this is understood to mean the same thing. - Limits and requests for
memory
are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value:128974848
,129e6
,129M
,128974848000m
,123Mi
.
If a container ends up consuming more resources than the limit:
- For
memory
it means the container will be killed by KubernetesOOMKiller
- For
cpu
it means the container will be throttled as it will not have access to enough cpu shares but it will not be killed
A typical deployment file would look like this:
Depending on the configuration, Kubernetes will assign one of three Quality of Service (QoS) class to the pod, based on the diagram below:

This QoS class will be used when determining which pods have to be killed/evicted first when the node comes under either cpu or memory pressure (i.e., the pods are consuming too much of the node resources).
BestEffort
pods are given the lowest priority for access to the cluster’s resources and may be terminated if other pods require the resources.Burstable
pods can temporarily use more resources than they requested if the resources are available, but the cluster will not guarantee these additional resources. These pods would be next in line to be killed if there are noBestEffort
pods and they exceedrequests
.- The cluster will ensure that
Guaranteed
pods have access to the requested resources at all times. They are guaranteed not to be killed until they exceed their limits or there are no lower-priority Pods that can be preempted from the Node.
Configuring pods running JVM applications as
BestEffort
or Burstable
can lead to issues that I will touch on in a later section.Certain behavior is independent of the QoS class assigned by Kubernetes. For example:
- Any container exceeding a resource limit will be killed and restarted by the
kubelet
without affecting other Containers in that Pod. - If a container exceeds its resource request and the node it runs on faces resource pressure, the Pod it is in becomes a candidate for eviction. If this occurs, all Containers in the Pod will be terminated. Kubernetes may create a replacement Pod, usually on a different node.
- The resource request of a Pod is equal to the sum of the resource requests of its component containers, and the resource limit of a Pod is equal to the sum of the resource limits of its component containers.
- The
kube-scheduler
does not consider QoS class when selecting which Pods to preempt. Preemption can occur when a cluster does not have enough resources to run all the Pods you defined.
Further reading:
Since Java 11 (and now back-ported to Java 8
8u372
), the JVM has been aware that it is running inside a container. The container awareness detection uses Linux's control group (cgroup
) filesystem to detect enforced resource quotas. As of this writing, the most recent versions of long-term support releases (8, 11, 17 and 21) support both cgroups v1 and cgroups v2 configurations.As you have seen, K8S lets deployments limit container resources via CPU and memory quotas. Those limits translate into options that are passed to the container engine when containers are deployed. Container engine options, in turn, set resource limits via the Linux cgroup pseudo-filesystem. The Linux kernel ensures that when resource limits are in place via the cgroup, no process goes beyond those limits (at least not for extended periods of time). When Java processes are deployed in such an environment, cgroup limits might be set for the deployed process. If the Java Virtual Machine does not take configured cgroup limits into account, it might risk trying to consume more resources than the operating system is willing to provide to it. The result could be the unexpected termination of the Java process.
For production deployments, you want to avoid the following situations:
- Getting
java.lang.OutOfMemoryError
exceptions, followed by the process crashing because the JVM does not have access to enough memory; - Getting a pod killed because one or more of its containers ended up using more than its memory
limit
, as illustrated below:

- Getting a pod killed/evicted because the node it’s hosted on gets under memory pressure and one or more of its containers are of the
BestEffort
orBurstable
QoS class.
As you have seen so far, avoiding (or minimizing) these occurrences means you need to properly configure how much memory is allocated to the JVM and how containers memory
requests
and limits
are defined.With a few exceptions, in the vast majority of cases, when you don’t specify the desired heap size using the
-Xmx
parameter or the -XX:MaxRAMPercentage
flag, Ergonomics ends up configuring the maximum heap value as ¼ of the available memory, which is generally not adequate for a container environment. Following exceptions apply:- If the container has up to 256MB of available memory, the maximum heap value will be 50%.
- If the container has between 256MB and 512MB of available memory, the maximum heap value will be approximately 127MB.
Parameters such as
Xmx
and MaxRAMPercentage
only affect Heap memory, and if set too close to the container limit may impact other memory spaces you have seen in an earlier section such as- Non-Heap Memory: A high setting for
-Xmx
can cause the non-heap memory to be exhausted, leading toOutOfMemoryError
exceptions and application crashes. - Metaspace Memory: The metaspace size is not limited by default, but it can be configured using the
-XX:MaxMetaspaceSize
flag. The rule of thumb here is to limit the metaspace size to a value 16 times lower than the available memory. So, for example, if the JVM has 4GB of available memory,-XX:MaxMetaspaceSize=256M
would be correct. - RAM-based file systems, such as tmpfs, can influence the
-Xmx
configuration too. For example, if the Pod has 4GB of available memory and a 1GBtmpfs
mount, a common setting for-Xmx
might be-Xmx3G
. This ensures that the JVM has enough memory to run correctly, even with the additional memory requirements of thetmpfs
mount.
Ideally, use load testing to find the maximum memory required across different spaces such as
Heap
, Metaspace
and ReservedCodeCache
and backwardly compute the memory limit
from these requirementsPrecise formula:
limit = ((XmX * 2) + MaxMetaspaceSize + ReservedCodeCacheSize)/0.75
- The
Metaspace
stores metadata and class definitions, so normally256MiB
is enough - The
CodeCache
stores compiled code generated by the JIT, so normally256MiB
is enough - The
0.75
denominator and2x
multiplier give your component some breathing room
Example for a “small” component requiring 2GiB of heap memory:
MEM Limit = (4GiB + 256MiB + 256MiB)/0.75 = 6GiB
If you don’t want to bother with
MaxMetaspaceSize
and ReservedCodeCacheSize
, you can let the JDK define default values, and instead set the memory limit
to XmX * 3
(on smaller values of Xmx
) down to XmX * 1.5
(on bigger values of Xmx
). For instance:- For a “small” component requiring 2GiB of heap memory, you can set the limit to 6GiB
- But for a “bigger” component requiring 8GiB of heap memory, 12GiB should be sufficient
Otherwise, start with a reasonable memory
limit
based on your experience with your application and set -XX:MaxRAMPercentage
to 60%. Adjust these values based on your application metrics (e.g., OutOfMemory
exceptions, free memory)For production deployments, define
requests = limits
to leverage Guaranteed
QoS class. For non-production deployments, you can set requests < limits
but do note that the bigger the difference between request and limit, the more likely you are to incur resources contention with other pods.To avoid the JVM having to deal with memory allocation and deallocation tasks, use the value of
Xms
equal to the value of Xmx
.Avoid using
SerialGC
in high-concurrency server-side environments by ensuring that the JVM is not limited to only 1 available CPU. There are several ways to achieve this, such as adjusting the container’s CPU limit to above 1001m or using the -XX:ActiveProcessorCount
flag with values greater than 1.Be aware that if your container has less than 1792MB of available memory and you don’t force a specific GC version, Ergonomics will also select
SerialGC
.You can also specify the desired GC implementation through Java arguments such as
-XX:+UseG1GC
or -XX:+UseParallelGC
.As a rule of thumb, use
ParallelGC
for heaps up to 4GiB and G1
for heaps above 4GiB. Although there are more GC implementations available, this should cover most use cases.The JVM interprets 1000m as 1 available CPU, 1001m as 2 CPUs, 2001 as 3, and so on.
If a container starts exceeding CPU
limits
, Kubernetes will begin throttling the container, meaning it will limit CPU usage, which may result in a performance drop for the application, though it’s important to note that the container will not be terminated or removed. Therefore, there is no definitive guidance on whether setting limits
for CPU resources is required or not.Do not specify resource
limits
on CPU. In the absence of limits, the request acts as a weight on how much relative CPU time containers get. This allows your workloads to use the full CPU without an artificial limit or starvation.Unless you are low on resources, define
requests >= 1001m
for every containersSet the number of CPUs visible by Java with the flag
-XX:ActiveProcessorCount
to ~2x the number of your container CPU requestsSeveral tools are available to help you adjust the value of requests as your application evolves:
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.