When organizations start their journey to the edge of the network, they quickly realize that the challenges they are faced with are different from what they experience inside a traditional data center. 

  • The first challenge is space. Many edge sites are smaller in physical size when compared to their larger core or regional data center counterparts, and trying to install hardware in a space that was not designed for it needs to be carefully planned out.  
  • The second challenge focuses around the environment. Heat, weather, radio-electrical emissions, lack of peripheral security, and the potential of having a limited supply of reliable power and cooling for equipment must also be addressed. 
  • The third challenge is network connectivity at remote locations that can vary greatly and often be slow or unreliable.

Amongst these, perhaps the most impactful challenge is the potential for having minimal to no IT staff on-site. This creates the need for drop-in deployments that require minimal intervention or know-how coupled with centralized management to ensure that there is a consistent environment for workloads, regardless of whether they’re deployed within the datacenter or at a remote edge location. 

For example, when deploying a cluster to deal with data aggregation and machine learning analysis in factories, we need to set up clusters in hostile environments where production cannot be interrupted because of a connectivity issue with a central location. We also need to add new hardware where no one has originally planned for it, while maintaining full high-availability capabilities. This requires us to build the smallest possible cluster delivering a local control plane, local storage, and compute to meet the requirements of demanding AI/ML or big data workloads while ensuring continuity.

Now imagine transposing these requirements to an oil rig, satellite, space station, or an emergency response situation where space, weight, power budget, and network needs are even more severely limited.

In all of the above cases, we have to:

  • Deliver true high availability.
  • Continue to fully operate, regardless of WAN connection state.
  • Pack this into the smallest footprint possible.
  • Be cost effective at scale.

This is what led us to re-engineer parts of Red Hat OpenShift in order to reduce the minimal number of machines to deliver a fully autonomous cluster. By allowing OpenShift to define nodes that have both supervisor and worker roles, we have reduced our minimum configuration from five to three servers. Fully supported starting with OpenShift 4.5, this smaller footprint will soon have OpenShift Container Storage (OCS) support to provide collocation of a Ceph storage cluster on the same servers in a hyper-converged configuration, eliminating a discrete storage footprint on the network and in turn reducing both acquisition costs and on-going operating costs.

And while most of the target workloads we are seeing for these deployments are now container based, we do often see some that are reliant on virtual machines. This is where OpenShift Virtualization, based on the KubeVirt project, also fully supported with OpenShift 4.5, becomes really important. By bringing in the management of VMs with Kubernetes, your toolsets are streamlined on a single platform. There’s no need to add additional APIs to maintain virtualization infrastructure. As long as you deploy your compact cluster on physical hardware, virtualization is available through the Kubernetes API.

Operationalizing a Three-Node OpenShift Compact Cluster

The minimum requirements for a three-node setup, which we call compact clusters, are as follows:

  • 3x Physical machines (we are planning to support this setup in virtual machines soon)
    • Base OS: Red Hat Enterprise Linux CoreOS 
    • 6 CPU
    • 24GB of RAM
    • 120GB of disk space

Of course, these minimums will need to be adapted to the workloads intended to be hosted on these machines, but they serve as a minimum specification for guidance purposes.

In order to instruct your deployment to collocate supervisors and workers, you will need to perform the following steps:

1: Prerequisites

The prerequisites for a compact cluster is the same as a standard OpenShift installation. These prerequisites include, but are not limited to, the following:

Please familiarize yourself with the official documentation to get further information about all the prerequisites.

One thing to note is that the requirement for a bootstrap node still applies. So when planning the installation of a three node cluster, you will need to temporarily account for a fourth node. Once the install is complete, the bootstrap can be removed.

The bootstrap node is temporary and can be a VM running in your environment or a VM on your laptop,  as long as the proper prerequisites are in place for the bootstrap node like DNS, network connectivity, etc.

Another thing to note is that the endpoints for the API load balancer and the Ingress load balancer should point to the IPs of the combined nodes.

2: Setup

Once you’ve staged your environment, and satisfied the prerequisites, you can now create the install-config.yaml file. This file is the same as a standard installation, with the exception of the worker replica count is set to 0.

Here’s an example:

apiVersion: v1
baseDomain: example.com
compute:
- hyperthreading: Enabled
 name: worker
 replicas: 0
controlPlane:
 hyperthreading: Enabled
 name: master
 replicas: 3
metadata:
 name: ocp4
networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 networkType: OpenShiftSDN
 serviceNetwork:
 - 172.30.0.0/16
platform:
 none: {}
fips: false
pullSecret: '{"auths": ...}'
sshKey: 'ssh-ed25519 AAAA...'

Once you’ve created the install-config.yaml file, run the openshift-install create manifests command in the same directory as the install-config.yaml file. You’ll see the following output:

$ openshift-install create manifests
INFO Consuming Install Config from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings

Note the message about marking masters as schedulable. This message signifies that you will be installing a three-node cluster where the masters will also act as workers. Verify this by taking a look at the scheduling manifest.

$ cat manifests/cluster-scheduler-02-config.yml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
 creationTimestamp: null
 name: cluster
spec:
 mastersSchedulable: true
 policy:
name: ""
status: {}

Verify that the mastersSchedulable is set to true.

3: Installation

At this point, installation continues as normal. You should follow the rest of the installation instructions on the official documentation page. The next step is to create your ignition files and install Red Hat Enterprise Linux CoreOS on all the masters and bootstrap nodes.

From Setup to Ongoing Management

Setting up the on-site clusters is not the end of the story though, as being able to manage a fleet of them is going to be essential, especially if no IT specialist will be on site to manage them.  

Our first recommendation is to do everything you can to maintain identical configurations across your deployment(s). This will mean generally prohibiting manual configuration changes on the servers to ensure consistency and reduce the potential for errors. Remote connections or OpenShift web-console utilization should be limited to debugging, and any configuration changes should be linked to a centralized process, as you would do for code. This is the basis of what is commonly called GitOps, and using a dedicated tool such as ArgoCD is one of the possible paths to ensure proper configuration management of your fleet. However, since you will also need a way to centralize status, deploy workloads based on policies, and perform all types of centralized administration tasks, we also have a tool to do exactly this. Released at the beginning of August, Red Hat Advanced Cluster Management for Kubernetes, aka ACM, has been tested to deal with thousands of clusters, and include GitOps tooling enables building policy based cluster management. ACM should be deployed on a cluster in a central location on which all of the edge clusters can connect to. For more information about this, I would recommend watching the ACM YouTube channel.

Finally, some of your workloads may need to provide a mechanism to deliver fresh data to its processing units, such as new models for a machine learning inference process. We will soon be releasing examples of how to perform this efficiently, as described in the blog post we published recently about Boosting manufacturing efficiency and product quality with AI/ML, edge computing and Kubernetes.

Our journey to deliver the base infrastructure to extend our Open Hybrid Cloud to the edge is just beginning, and we will soon be enabling more location types and use cases where even compact clusters are too large. Stay tuned to the OpenShift blog for more in the coming months!

 


About the authors

Christian Hernandez currently leads the Developer Experience team at Codefresh. He has experience in enterprise architecture, DevOps, tech support, advocacy, software engineering, and management. He's passionate about open source and cloud-native architecture. He is an OpenGitOps Maintainer and an Argo Project Marketing SIG member. His current focus has been on Kubernetes, DevOps, and GitOps practices.

Read full bio