What is ROSA?

Red Hat OpenShift Service on AWS (ROSA) is a fully-managed, turnkey application platform available in the AWS Cloud that allows you to focus on delivering value to your customers by building and deploying applications. ROSA is jointly engineered, operated and supported by Red Hat and AWS. With a fully integrated and managed application platform such as ROSA, you get a much faster time to value and can focus on the things that matter most to your business and your customers without worrying about running a complex platform.

ROSA Resilience Features

ROSA is designed to be highly resilient. Resilience is the ability of a system to react to failure, and outages to remain functional. ROSA provides many features to protect against failure and also takes advantage of the high availability and resilience of the AWS cloud to avoid downtime.

 In the next sections, we will explore the ROSA architecture and how it is designed to be inherently resilient and also look at the concept of cloud resiliency.

One of the ways to improve resilience is through redundancy. ROSA is designed to be highly redundant. By default, a ROSA cluster has at least three control plane nodes, a minimum of two infrastructure nodes and a minimum of two worker nodes for a single availability zone cluster.

The control plane manages the entire cluster. Each control plane node is made up of an API server, an etcd and controllers. The presence of three control plane nodes accounts for high resilience and availability of your cluster. In the event of a control plane node failure, all API requests get automatically routed to the other available nodes without any noticeable impact on the cluster and to the customer’s applications.

Each infrastructure node is made up of a built-in OpenShift Registry, a router layer and a monitoring layer. The presence of at least 2 infrastructure nodes ensures the resilience of the OpenShift router layer. Similar to control plane nodes, the failure of a single infrastructure node does not lead to any noticeable effect on the cluster’s and customer’s application uptime.

The control plane and infrastructure nodes have the added advantage of being fully managed by the Red Hat Site Reliability Engineers (SRE). Our SREs proactively monitor the ROSA cluster, and are responsible for replacing any failed control and(or) infrastructure nodes.

Worker nodes are the virtual machines that contain your application pods. Worker nodes are more disposable and can be easily replaced. The presence of at least 2 worker nodes also accounts for high resilience. If a worker node fails, the control plane relocates unscheduled pods to the functioning worker node(s) until the failed node is recovered or replaced. Failed worker nodes can be replaced manually or automatically by enabling cluster and machine autoscalers.

The presence of several control plane nodes, infrastructure nodes, and worker nodes in a single cluster provides protection against single-node failures and outages.

AWS Resilience Feature

The default (simplest) ROSA cluster gets deployed into a single Availability zone in the AWS cloud. An Availability Zone (AZ) is a distinct physical location within an AWS region. While this default deployment is a first great step for any IT professional wanting to learn about ROSA, single AZ clusters are not recommended for production workloads as they do not provide protection from AWS zonal failures.

To protect against a zonal failure, ROSA makes use of the construct of Availability zones. ROSA provides the option for clusters that are distributed across three availability zones, known as multiple availability zone clusters. In this deployment option, each of the control plane nodes, infrastructure nodes and worker nodes will be distributed across the different availability zones to improve resilience and availability. This is the recommended deployment pattern for production grade workloads.

In conclusion, ROSA is a robust cloud-native application platform that is by default designed to be highly resilient, available and fault tolerant and takes advantage of the high availability of the AWS Cloud to improve its resilience.

Watch this video to learn more about ROSA resilience.



How-tos, AWS, ROSA, High Availability

< Back to the blog