Opening introduction

Capacity planning is more an art than a science: While performance tuning is a process to optimize the existing systems to achieve higher performance, capacity planning determines how a system needs to be configured to behave in the most advantageous way, using the current performance as a baseline.

It is difficult to predict the amount of resources that production OpenShift objects may need to function optimally. OpenShift typically runs a varied number of resources set under different profiles in a multipurpose environment.

It is the design team’s goal to anticipate the required total capacity by ensuring to meet the installation prerequisites and compute the necessary resources in case of a cloud installation, but it is the developer team that will know the exact resource expectations of a service implementation.

The good news is that there are some tools that can help figure out the right numbers in a pre-production phase. In this introductive blog article, we will provide some high-level tips for general configuration recommendations and discuss these tools.

Reviewing the OpenShift core tools

To keep control of the availability of resources across an OpenShift cluster, an administrator can use some core mechanisms.

ResourceQuotas are namespace-wide restrictions that force all the pods in a namespace not to exceed the assigned quota or hog a namespace's resources, irrespective of which node they are scheduled on, including storage and objects count. So you can specify a hard limit of 1 core, 4GiB of RAM, and 10 Pods maximum:

spec:
hard:
  cpu: 1
  memory: 4Gi
  count/pods: 10

LimitRanges are similar to ResourceQuota except that they are resource-specific: Pods and other objects usage can be limited by cluster administrators who apply the LimitRanges to ensure the optimal use of resources, by defining a range of available compute and memory resources:

spec:
limits:
  - type: "Container"
    max:
      cpu: "2"
      memory: "1Gi"
    min:
      cpu: "0.7"
      memory: "4Mi"
    defaults:
      cpu: "1.3"
      memory: "100Mi"

PriorityClasses allow to define Pods priority, and this is instead an OpenShift scheduler thing. The priority indicates the importance of a pod relative to other pods. If a PodA cannot be scheduled, the scheduler attempts to evict lower priority pods to allow PodA to be scheduled. You set a priority class with a mandatory field value (the bigger, the higher the priority is):

oc create priorityclass high-priority --value=1000 --description="high priority"

A good use of PriorityClass is to reserve larger priority numbers to critical system pods that should not usually be preempted or evicted. A potential abuse is that users can try to set the highest possible priority for their pods. To prevent that, ResourceQuota supports PriorityClass, allowing it to unmatch the quota if the set value is too high.

Kubernetes QoS

Also, Kubernetes defines QoS (Quality of Service): When a pod is created, it get assigned a QoS class, a property derived from the pod resource limits configuration:

  • BestEffort: No resource limits defined - A BestEffort CPU container will be able to consume as much CPU as is available on a node with the lowest priority.
  • Burstable: Only memory limits are set - Is guaranteed to get the minimum amount of CPU requested, but it may or may not get additional CPU time. Excess CPU resources are distributed based on the amount requested across all containers on the node.
  • Guaranteed: Both memory and CPU limits are set - It is guaranteed to get the amount requested and no more even if there are additional CPU cycles available

For best hardware use in a non-production cluster, which is more dynamic and creates and destroys objects at a high rate, it may be a good idea to mainly use BestEffort and Burstable. On production clusters, where we want things to be stable and predictable, it is better to orient the choice on the use of mostly Guaranteed type and some Burstable.

Don't touch the OpenShift core

On OpenShift core components, you do not want to apply quotas.

The risk is to severely impact important functionalities. Suppose you set ResourceQuota on the Etcd namespace. In case of a spike in the amount of data handled by Etcd, the applied constraints may be not enough to sustain that spike, and Etcd might exhaust its memory and kill itself, deteriorating the cluster health.

Other vendors plug-ins and integrations

Do the vendors of additional components such as monitoring, logging, or others have resource requirements/documentation? Typically, no quotas and limits should be modified here, too, in case you have to consult the vendor.

To overcommit or not to overcommit?

Overcommitting is the practice to present to the hosted applications more CPU and memory resources than there are physically: This allows increasing the density of workloads running on the platform but will turn down individual applications performances.

In case of OpenShift on OpenStack, for instance, when looking to implement overcommit, consider the complexity of the two systems. Both can manage resources through detailed tuning. And you have probably had success tuning each independently. But when placing them together, you should tune based on the complete stack, not each layer in isolation.

In this case, both OpenShift and OpenStack are capable of implementing complex resource management. If we allow them both to do it at the same time, we need to ensure they do not conflict with each other. So if you do choose to implement resource management at both levels, be sure you understand exactly how that will interact. This can be hard, is often time-consuming and error-prone, and consistently needs review.

Instead, it is easier to overcommit on only one level and ideally the one closer to the user. And in this case, that is OpenShift, which has complex resource management already built in.

But for this to work, it needs a consistent set of resources presented to it. If OpenStack is constantly adjusting memory and resources, how will OpenShift be able to accurately adjust for it? It is like two people trying to drive the same car in different directions. It is often better to trust OpenShift to work its magic based on consistent resources and avoid having the resource rug pulled out from under it by OpenStack trying to manage this as well.

We are not saying you cannot overcommit, just that if you do so, you will need to have a very detailed knowledge of your workloads, one which you probably cannot obtain for everything you have on your cloud. And implementing overcommit in one place on this stack will make troubleshooting resource issues easier and quicker, while likely reducing them in the first place.

The Reference Architecture of OpenShift on OpenStack may include further material on this topic, and more is to come.

Applications on OpenShift

To define the required resources of applications running on OpenShift, the base idea is to run a series of stress tests to measure the amount of resources and determine what the baseline will be, before moving to specific tunings (such as on the JVM or the single pods allocations).

Admins and developers will be able to get an overview of the resource usage requests for their application by going to the OpenShift Web Console → Monitoring section. Here is the historical consumption for both CPU and memory: For all apps in the project aggregated, or separately.

The Monitoring tab in the Developer view in Consnole

Figure1: The Monitoring tab in the Developer view in Console

Resources limits and compressibility

Regarding how to configure ResourceQuota, there are few to no issues with setting a CPU limit incredibly high or not setting a CPU limit at all: CPU is a compressible resource. This means that if your app arrives at a point where it hits the CPU limits, Kubernetes will just begin throttling your containers. If you set a hard limit, instead the CPU will be artificially restricted, giving your app potentially worse performance. However, it will not be terminated or evicted. You can make use of the Readiness or Liveness health probes to check if performance is impacted.

The situation is complicated if more applications simultaneously start creating some resource consumption peaks. In this case, you can use the Kubernetes QoS again, as explained in a previous section, prioritizing Guaranteed policies in production and Burstable in test environments, as explained in the Kubernetes QoS section above.

In some specific situations, however, setting CPU hard limits may be a good choice, for example when you want to:

- Achieve predictable performances rather than the best performances

- Protect deployments from other greedy containers

- Avoid resource starvation when other concurrent applications may peak their resources consumption while they are starting up

- Keep overcommitting under control

Instead, memory is an incompressible resource: If allocation is too low, this will invoke the language memory systems such as the garbage collection or, worse, the application may go to Out Of Memory and the pods will be killed by the Linux kernel. Memory optimization is a very vast topic and requires mixing best programming practices, stress tests, and using the right tools.

We willl see a couple of these tools in the next two sections that, in conjunction with the OpenShift monitoring stack, can help you to determine the right amount of memory to assign to your applications.

So the key takeaway when doing capacity planning of containerized apps is that lack of compressible resources (such as CPU or networking) will throttle the resources themselves but will not kill pods, while unhandleable pressure on incompressible resources (such as memory) may kill your pods.

Java on OpenShift

In the case of Java programs, it is known that the longer they run, the better they perform. Said the other way, they consume much more resources during their startup. So, when defining baselines, it might be useful to keep this in mind and account this initial period as a "warm-up" period where you make no or little measurements.

Teams want to run a series of tests to benchmark the app performance before deciding whether additional tuning is needed and, in case, consider tuning the JVM. For this we provide OpenJDK specific documentation and a Red Hat Lab: Tick with yes the "Is the application running on OpenShift?" option and fill the fields to retrieve the recommended settings on this page: https://access.redhat.com/labs/jvmconfig/

Figure 2: JVM Options Configuration Lab

The choice of a collector is not easy and would require a very long discussion. Shortly, with the CMS collector being deprecated soon, and Shanondoah being still experimental, the choice restricts mostly to the G1, the default and best collector in the majority of cases, and Parallel, which may make sense in case of hyperthreaded containers.

You may also want to consult the Universal Base Images OpenJDK runtime images page for more info on how Red Hat packs and optimizes the OpenJDK container image.

The VerticalPodAutoscaler

In the process of defining the baselines and succeeding in capacity planning, you can also benefit from the Vertical Pod Autoscaler, now a generally available feature in OpenShift 4.8.

The VPA monitors the historical trend of resources consumed by pods during time and can automatically tune the pods limits or, in Off mode, just give some recommendations so that they may be applied manually.

In Auto mode, the VPA applies the computed settings automatically throughout the pod lifetime: The VPA rolls out again the pods in the project that are out of alignment with its recommendations so the limits are applied in their configuration:

Figure3: The VerticalPodAutoscaler in action

To give an example, let's imagine you have deployed the "Get started with Spring" tutorial in the spring-one namespace and want to get some LimitRange recommendations to understand how it behaves.

You begin by installing the VerticalPodAutoscaler from the Operators → OperatorHub in its openshift-vertical-pod-autoscaler system namespace. Then, you create a VPA object in the spring-one namespace, where you want to observe the deployment named rest-http-example. You don't want the VPA to take control over your pods, so you just specify the "Off" updateMode:

oc create -f - << EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: spring-recommender
namespace: spring-one
spec:
targetRef:
  apiVersion: "apps/v1"
  kind:       Deployment
  name:       rest-http-example
updatePolicy:
  updateMode: "Off"
EOF

At this point, you simulate typical workloads and finally output what is the range of values that the VPA suggests, by printing the VPA content and examining the target section that contains recommended values:

$ oc -n spring-one get vpa spring-recommender --output json | jq .status.recommendation                ~
{
"containerRecommendations": [
  {
    "containerName": "rest-http-example",
    "lowerBound": {
      "cpu": "25m",
      "memory": "262144k"
    },
    "target": {
      "cpu": "25m",
      "memory": "262144k"
    },
    ...
  }
]
}

Scheduler settings

There is also great control on the scheduling settings for capacity optimization to take into account. Users can decide where to deploy Pods with NodeSelector, define Scheduler profiles for a uniform cluster usage, use PodPriority, and tune it with a minimum number of available replicas with PodDisruptionBudget. Ops can observe the availability of resources with the integrated OpenShift Monitoring stack or with the Cluster Capacity Tool, an upstream project that may help measuring and simulating cluster capacity availability either with a local binary or with ad-hoc pods.

Summary

In this blog article, we reviewed the available core tools for limiting and configuring applications resources, clarified the right OpenShift core settings, and lightly covered the tricky topic of platform overcommit. We then reviewed what to do to start with a capacity planning process for applications by suggesting some methodology and tools, notably the Java Virtual Machine lab and the Virtual Pod Autoscaler.

Happy tuning.


Categories

How-tos, cloud scale, massive scale, scaling, configuration

< Back to the blog