OpenShift is used to host critical applications across enterprises around the world. As with all critical applications, application teams expect their applications to be highly available, achieving industry standard benchmarks such as five nines of availability to provide continuous service to their customers.
OpenShift provides many different constructs to help you deploy your applications in a highly available manner, such that outages are avoided even if application instances or their underlying infrastructure become unhealthy or are restarted. In this article, we will explore nine best practices for deploying highly available applications to OpenShift.
1: Multiple replicas
Running more than one instance of your pods ensures that deleting a single pod will not cause downtime. For example, you are running a single instance of your application; if it gets deleted, you may find your application completely unavailable, though this is usually short-lived. In general, if you only have one replica of the application, any abnormal termination of it will result in downtime. In other words, you must have at least two running replicas of the application. Services have an integrated load balancer that will distribute network traffic to all pods of deployment.
We recommend setting multiple replicas as part of a deployment.
2: Update strategy
The Rolling strategy is the default deployment strategy. It replaces pods, one by one, of the previous version of the application with pods of the new version while ensuring at least one pod is running.
In Recreate deployment, we fully scale down the existing application version before we scale up the new version. The recreate method involves some downtime during the update process. Downtime might not be an issue for applications that can handle maintenance windows or outages. However, if there is a mission-critical application with high service level agreements (SLAs) and availability requirements, choosing a Rolling Update deployment strategy would be the right approach.
We recommend using “RollingUpdate” when possible.
3: Handling SIGTERM signal gracefully
When an OCP deployment is restarted or a pod is deleted, Kubernetes sends a SIGTERM signal to all the containers of the pod in an attempt to give the container an opportunity to gracefully shut down. Common use cases of handling SIGTERM gracefully would be to finish handling client requests or to persist state.
The default grace period to handle SIGTERM is 30 seconds. If users expect this to take longer, they can increase this by using the terminationGracePeriod deployment setting.
Handle the SIGTERM signal in applications to ensure that applications shut down gracefully. Adjust the terminationGracePeriod setting as required.
Probes (health checks) play a vital role in monitoring application health. Readiness probes determine whether an application is ready to accept traffic. Liveness probes determine if the application should be restarted.
Liveness probes are used to restart applications when they enter a broken state. It is a self-healing feature of Kubernetes/OpenShift and is important for applications to have to ensure they can attempt to recover in an automated fashion.
Readiness probes are used to take applications out of the load-balancing pool when they enter an undesirable state and become unable to serve traffic. They prevent clients from connecting to the application when it needs to be taken out of the load balancing pool temporarily to recover (but not be restarted).
The application accepts traffic when the readiness probe passes and the app becomes “Ready”. The app does not accept traffic and remains separate from the pool when the readiness probe has either failed or has not passed yet.
Some applications have high startup times. For example, there may be a large number of initialization tasks, or the application may be dependent on an external service that is not yet ready. This makes setting liveness and readiness probes difficult because it is challenging to determine how long an application will take to start. To remedy this, users can leverage the “startup probe” to identify the moment an application starts. Then, once the startup probe returns successfully, the liveness and readiness probes will begin monitoring the application’s health:
Our recommendation is to leverage Liveness probes and Readiness probes to help ensure applications are healthy and constantly in a state to serve traffic. Use Startup probes to determine the point at which the application has started up.
5: External dependencies readiness
When the app starts, it should not crash due to a dependency such as a database that is not ready. You also want to ensure that dependencies are healthy during the life of your application. You can use initContainers/startupProbe to check external dependencies before running your main container. While the application is running, you can use the main container’s readinessProbe to ensure that it is only ready when connected to a healthy dependency.
We have discussed readinessProbes in the previous section. Here is an example of an init container that waits for a database to become ready:
- name: wait-postgres
until (pg_isready -h example.org -p 5432 -U postgres); do
We recommend using an initContainer or startupProbe to postpone application startup until dependencies are healthy. While your application is running, use a readinessProbe to continue to monitor the dependency’s health.
A PDB (Pod Disruption Budget) limits the number of pod replicas that the cluster is allowed to take down for maintenance operations. PDB would ensure that the number of replicas running is never brought below the number specified. When a node is drained, all the pods on that node are deleted and rescheduled. If you are under a heavy load, the drain event could affect your availability. To maintain availability during cluster maintenance, you can define a Pod Disruption Budget.
PDB is recommended for critical applications running in production. It provides you the means to specify a maximum limit to the number of application pods that can be unavailable during managed maintenance.
OpenShift has two autoscalers available for the automatically scaling application pods:
Horizontal pod autoscaler (HPA)
HPA is a feature used to scale pods out automatically based on gathered metrics. By using an HPA, you can maintain availability and improve responsiveness under unexpected traffic conditions.
HPA scales pod replicas based on the following formula:
X = N * (c/t)
In the above formula, X is the desired number of replicas, N is the current number of replicas, c is the current value of the metric, and t is the target value. You can find more details about the algorithm in the documentation.
The following example illustrates an HPA for CPU utilization, where the pods scale up or down to maintain a minimum of 3 and a maximum of 10, with a target mean CPU utilization of 50%.
- type: Resource
Another example to illustrate HPA for Memory utilization: HPA will scale pods up/down keeping a minimum of 3 and a maximum of 10 with an average memory of 500Mi:
<same as previous example>
- type: Resource
NOTE: The above examples use the autoscaling/v2beta2 API version, which is the most recent version at this time. For CPU utilization metrics, you can either use autoscaling/v1 or autoscaling/v2beta2 API version. For Memory utilization metrics, you must use the autoscaling/v2beta2 API version.
Vertical pod autoscaler (VPA)
While an HPA is used to scale additional pods to meet demand, a VPA is used to scale resources vertically for individual pods. A VPA optimizes the CPU and memory request values and can maximize the efficiency of cluster resources.
Similar to an HPA, a VPA automatically calculates target values by monitoring resource utilization. However, unlike an HPA, a VPA evicts pods in order to update them with new resource limits. The VPA accounts for any governing pod disruption budget policies to ensure that disruptions do not occur during the eviction process. Finally, when the workload is re-deployed, the VPA mutating admission webhook overwrites the pod resource with optimized resource limits and requests before the pods are admitted to a node.
Following is an example VPA for auto mode. In auto mode, VPA assigns resource requests on pod creation and updates the existing pods by terminating them when the requested resources differ significantly from the new recommendation.
There are other available modes, which can be found at OpenShift docs.
Things to consider before using VPA:
- You must have a minimum of two replicas for the VPA to automatically delete pods.
- Pods must be running in the project before VPA can recommend resources and apply the recommendations to new pods.
- VPA reacts to most out-of-memory events, but not in all situations.
- Avoid using HPA and VPA in tandem, unless you configure the HPA to use either custom or external metrics.
- In production, use VPA in Recommendation mode. That would be helpful to understand what the optimal resource request values are and how they vary over time.
- Use HPA for a sudden increase in resource usage over VPA, as VPA provides recommendations over a longer time period.
8: Leverage Pod Topology Spread Constraints
One of the core responsibilities of OpenShift is to automatically schedule pods on nodes throughout the cluster. However, if all pod replicas are scheduled on the same failure domain (such as a node, rack, or availability zone), and that domain becomes unhealthy, downtime will occur until the replicas are redeployed onto another domain. To prevent such an incident, you can leverage Pod Topology Spread Constraints.
Pod Topology Spread Constraints is a feature that allows you to disperse pod replicas evenly across your cluster. Below shows an example set of Pod Topology Spread Constraints.
- maxSkew: 1
- maxSkew: 1
- image: "docker.io/ocpqe/hello-pod"
This example defines two pod topology spread constraints. The first is used to ensure that pods are distributed across nodes by referencing the topology label “node” (defined by the “topologyKey” setting). To be effective, each node in the cluster must have a label called “node” with a unique value. The “maxSkew” setting tells OpenShift the maximum acceptable difference in the number of pods between any two nodes.
The second pod topology spread constraint in the example is used to ensure that pods are evenly distributed across availability zones. To be effective, each node in the cluster must have a label called “zone” with the value being set to the availability zone in which the node is assigned. By using two separate constraints in this fashion, you can ensure that your pods are distributed evenly across availability zones and the nodes within those zones.
Use Pod Topology Spread Constraints to ensure that pods are distributed across failure domains. Cluster operators should ensure that nodes are labeled in a consistent manner for the constraints to reliably distribute workloads.
9: Deploy applications using Blue/Green or Canary strategies
One major goal for most enterprises is to deploy applications and features without interrupting users or processes that interact with your application. To help ensure a successful rollout, Blue/Green and Canary deployments are both popular options for deploying new releases of an application. Let’s explore how the Blue/Green strategy can be performed in OpenShift.
Blue/Green deployments involve deploying a new version of your application alongside the old version. The old version can be referred to as “blue”, and the new version can be referred to as “green”. During the deployment, traffic is flipped from the blue side to the green side. In the event an issue occurs, traffic can be flipped back to the blue side.
This traffic flip can be performed using an OpenShift route. Below shows an example of a route selecting a blue service:
To perform a traffic flip after deploying the green side, the route’s “to” section can be updated to target the green service:
<same as previous>
Blue/Green deployments are powerful, but in some cases, it is not desirable to shift 100% of traffic to the other side. This is where Canary deployments come in, which provide greater flexibility over the amount of traffic being shifted.
Canary deployments allow you to route a subset of traffic to the new version of your application. Whereas blue/green deployments involve shifting 100% of traffic to the new version, canary allows you to shift, for example, 10% to the new version. Then, you can slowly ramp up to 100% as desired.
Below is an example of an OpenShift route that shifts 10% to the new version of an application (green):
- kind: Service
The percentage of traffic being handled by a given service is calculated using the formula weight / sum_of_weights. The my-app-blue service will handle 90% of traffic because 90/(90+100) is 0.9, therefore, the green service will handle 10%. Using canary deployments in this fashion allows you to slowly ramp up traffic as you monitor your application’s health.
Use Blue/Green or Canary deployments to prevent disruptions during the rollout of new application versions. Use Canary deployments if greater flexibility of the traffic rollout is required.