Thanks to our testers: Mark Dunnett and Marc Nozell

Abstract

The release of Red Hat OpenShift 4.6 brought the general availability of remote worker nodes along with a number of use cases at the network edge. Steve already had an OpenShift 4.6 cluster running in his home lab, but Kevin’s home lab is like many others in that it doesn’t have the hardware and resources to run a whole OpenShift 4 cluster. We decided to build a small footprint remote worker node at the network edge and connect it to Steve’s cluster.

Disclaimer: This is a proof of concept we built to work through the logistics of accomplishing this connectivity and is not intended for production use.

Architecture

Why WireGuard?

Steve’s and Kevin’s home labs were halfway across the country from each other but accessible through the internet. Since OpenShift expects simple connectivity between all the nodes, we knew building a path from the remote worker to the main cluster would be our biggest hurdle. We brainstormed a few possible solutions to this problem:

  1. Open/forward ports on both of our home routers to allow all the pertinent traffic to and from Kevin’s remote worker and the rest of the cluster
  2. Set up a full VPN from one side to the other to allow a more “direct” connection between the new node and the cluster
  3. Use WireGuard to set up an ad-hoc VPN connection bridging the remote worker and cluster

We opted for the third choice, partly because it felt like the simplest solution, but also because we wanted an opportunity to use WireGuard in the real world. To simplify the overall configuration, we decided to build the WireGuard tunnel between the remote worker and the helper directly, rather than including the rest of the existing nodes in the tunnel.

The details

This article does not cover the installation of the main cluster itself; these are only the steps we went through to connect a new remote worker into an existing cluster. For details on setting up a whole cluster, review these docs. In our labs, we use a helper machine that provides a number of useful services, including DHCP, DNS, PXE, a load balancer, and a place to run playbooks to manage our cluster.

For the remote worker node, we opted for Red Hat Enterprise Linux (RHEL) 7 over the default choice of Red Hat CoreOS (RHCOS), as RHEL 7 allowed us the flexibility of customizing and troubleshooting WireGuard while we were setting it up the first time. To run the supported Ansible scale-up playbook for adding a RHEL 7 node to the cluster, we used RHEL 7 for our helper node. The helper machine lives alongside the main cluster and can be reused to add subsequent remote workers into the cluster. If you want/need to create a new machine to run the playbook(s),  it can also be created alongside the new worker as part of the process of building and adding the new worker to the cluster.

Note: We used RHEL 7 for the helper VM because the required package (openshift-ansible) is not available for RHEL 8

For this article, we will use the following examples:

NameNetwork IPWireguard IPOS
Steve’s helper192.168.30.10010.0.0.1RHEL 7.8
Steve’s master 1192.168.30.101 RHCOS
Steve’s master 2192.168.30.102 RHCOS
Steve’s master 3192.168.30.103 RHCOS
Steve’s worker 1192.168.30.104 RHCOS
Steve’s worker 2192.168.30.105 RHCOS
Kevin’s remote worker192.168.40.10010.0.0.2RHEL 7.8
NetworkPublic IP
Steve’s home lab1.2.3.4
Kevin’s home lab5.6.7.8

The existing home lab networks use the 192.168.30.0/24 and 192.168.40.0/24 for Steve’s and Kevin’s networks respectively. The WireGuard subnet is unrelated to the existing subnets. It must not conflict with them, and for simplicity should be an RFC 1918 reserved subnet for non-routable traffic.

For the purpose of demonstration, the 1.2.3.4 and 5.6.7.8 IP addresses are used to refer to Steve’s and Kevin’s public IPs, respectively. To replicate this configuration on other networks, it’s necessary to adjust these IPs to match the real public IP addresses associated with the cluster and remote worker’s networks.

Our Setup and Configuration

Overview

Our first order of business was getting WireGuard configured and working from Kevin’s remote RHEL 7 worker to the helper in Steve’s cluster. Once we had that connectivity functional, we used the Ansible scale-up playbook to add the new node to the cluster. When the node was successfully added to the cluster, we created static routes on the members of the cluster so they knew to direct traffic through the helper to get to Kevin’s worker for any necessary connectivity.

Configure Home Lab Networks

To simplify the configuration of connectivity between our labs, we used static IPs for all VMs to streamline the firewall/NAT rules and WireGuard interfaces. We configured our routers to direct  WireGuard’s traffic to the helper and remote worker on each side. WireGuard defaults to port 51820, but it can be any arbitrary port as long as you configure the endpoints correctly.

For the remote worker to resolve names inside the cluster, we want it to point to the helper for DNS. This is a relatively simple process, where we set the helper’s internal IP address as the primary DNS server in the network configuration for the remote worker, adjusting the existing entries to be secondary/tertiary nameservers:

remote # vim /etc/sysconfig/network-scripts/ifcfg-<interface>
...
NAME=enp1s0
DEVICE=enp1s0        # before we change anything
DNS1=192.168.40.200  # upstream DNS server in kevin’s home lab network
DNS2=192.168.40.201  # upstream DNS server in kevin’s home lab network
...


# After we have put the helper at the top of the list of DNS servers
...
NAME=enp1s0
DEVICE=enp1s0        # after we change anything
DNS1=192.168.30.100  # This is helper.steveslabdomain.com
DNS2=192.168.40.200  # upstream DNS server in kevin’s home lab network
DNS3=192.168.40.201  # upstream DNS server in kevin’s home lab network
...


# Restart services for changes to take effect
remote # systemctl restart NetworkManager
remote # cat /etc/resolv.conf

Configuring WireGuard Endpoints

On both the helper and remote worker, we configured the RHEL 7 repositories to allow us to install the WireGuard packages:
helper & remote # yum install \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm \
https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm 

Then we installed the WireGuard packages for the module itself, and the tools to manage the tunnels. We also rebooted the machine to have it load the new kernel modules:

helper & remote # yum install kmod-wireguard wireguard-tools
helper & remote # reboot

Next, we generated a WireGuard private/public key pair on the two machines we wanted to connect with a tunnel. Starting with the helper, we generated Steve’s private key and wrote it to a file. The system warns us that the default permissions for this new file aren’t terribly secure, so we changed the mode of the file after creating it:

helper # wg genkey > helper-private-key
Warning: writing to world accessible file.
Consider setting the umask to 077 and trying again.
helper # chmod 600 helper-private-key
helper # cat helper-private-key
qF6R37cuXJ+pnKCDbEQKjOorEcQL6pDEdo2zYH09zU0=

We then used that private key to generate pubkey in the same directory, saved to another file:

helper # wg pubkey < helper-private-key > helper-public-key
helper # cat helper-public-key
iJxQgLqHntxoSTDcchl3bo2xsqRVyAf0uYOhGkzu/WU=

We generated another private/public key pair on Kevin’s remote worker, and also changed the permissions on it:

remote # wg genkey > remote-private-key
Warning: writing to world accessible file.
Consider setting the umask to 077 and trying again.
remote # chmod 600 remote-private-key
remote # cat remote-private-key
iKfE7LDyqhnfXjmiT4yg21Q6in8zRrpxw8+HFouIEU4=

We then used that private key to generate pubkey in the same directory, saved to another file:

remote # wg pubkey < remote-private-key > remote-public-key
remote # cat remote-public-key
NycldbWN6NSkY4eEAEMQuH7nOCfe0HfxwuY5L7toBlQ=

After generating all the keys, we create a configuration file for the WireGuard interface on each side. The subnet we’ve configured for our WireGuard tunnel (10.0.0.0/24) has been chosen at random with two caveats: It should not conflict with anything on the local networks, and the configuration needs to use the same subnet on each side of the tunnel. As previously shown in the table above, Steve’s network uses 192.168.30.0/24 for the cluster (and everything else that lives in the homelab) and Kevin’s network uses 192.168.40.0/24 for everything.

Steve’s helper’s /etc/wireguard/wg0.conf file is shown below.  Note that the AllowedIPs should be configured for the WireGuard internal IP and the remote worker’s subnet.

# steve’s helper
[Interface]
Address = 10.0.0.1/24
# steve’s private key below
PrivateKey = qF6R37cuXJ+pnKCDbEQKjOorEcQL6pDEdo2zYH09zU0= 
ListenPort = 51820
# kevin’s remote worker
[Peer]
# kevin’s public key below
PublicKey = NycldbWN6NSkY4eEAEMQuH7nOCfe0HfxwuY5L7toBlQ=
# 5.6.7.8 is kevin’s home lab’s public ip
Endpoint = 5.6.7.8:51820 
AllowedIPs = 10.0.0.2/32,192.168.40.0/24
PersistentKeepalive = 25

Kevin’s remote worker’s /etc/wireguard/wg0.conf file is shown below.  Note that the AllowedIPs should be configured for the WireGuard internal IP and the subnet where the helper and the rest of the cluster reside.

# kevin’s remote worker
[Interface]
Address = 10.0.0.2/24
# kevin’s private key below
PrivateKey = iKfE7LDyqhnfXjmiT4yg21Q6in8zRrpxw8+HFouIEU4=
ListenPort = 51820
# steve’s helper
[Peer]
# steve’s public key below
PublicKey = iJxQgLqHntxoSTDcchl3bo2xsqRVyAf0uYOhGkzu/WU=
# 1.2.3.4 is steve’s home lab’s public ip
Endpoint = 1.2.3.4:51820
AllowedIPs = 10.0.0.1/32,192.168.30.0/24
PersistentKeepalive = 25

Network Connectivity From Cluster to Remote Worker

Configuring WireGuard to allow traffic from the cluster network to the remote worker is necessary for calling the Kubernetes service when issuing commands such as oc logs, oc debug, etc.

To accomplish this we’ve used nmcli on each of the members in the existing cluster to create a persistent static route that directs traffic through the helper, across the tunnel, to reach the remote worker. In our cluster, all the nodes have a single network interface named enp1s0, so we ran the following commands on each of the nodes:

nodeX # nmcli con modify enp1s0 +ipv4.routes "192.168.40.100/32 192.168.30.100"
nodeX # nmcli con up enp1s0

On the helper, we enabled packet forwarding to allow bidirectional traffic to flow through the WireGuard tunnel. This is done by setting a sysctl value and configuring some firewall rules. Temporarily setting the value for this is done by:

helper # sysctl -w net.ipv4.ip_forward = 1

Persistently setting this value is done by editing /etc/sysctl.d/99-sysctl.conf and adding the following line:

net.ipv4.ip_forward = 1

We also created firewall rules to allow traffic to masquerade through the helper to the rest of the cluster network.  The first rule allows traffic to masquerade from the remote worker to the cluster, and the second rule allows traffic to go in the other direction. For simplicity we’ve created rules that allow both subnets through the tunnel because we wanted to avoid creating rules for every single node in the cluster, in both directions. This also mirrors the AllowedIPs WireGuard configuration.

helper # firewall-cmd --add-rich-rule=\
"rule family=ipv4 destination address=192.168.30.0/24 masquerade" --permanent # to local subnet
helper # firewall-cmd --add-rich-rule=\
"rule family=ipv4 destination address=192.168.40.0/24 masquerade" --permanent # to remote subnet
helper # firewall-cmd --reload

Next, restart the WireGuard quick service on both the helper and the remote node to have it read the configuration files we created:

helper # systemctl enable wg-quick@wg0.service
helper # systemctl restart wg-quick@wg0.service

These configuration changes are persistent and do not require anything else for them to function after reboots, etcetera.

Verify WireGuard Connectivity

Run the wg command to display the current WireGuard connections. If you see received transfer data, your tunnel has been established and both sides can see each other:

helper # wg
interface: wg0
public key: iJxQgLqHntxoSTDcchl3bo2xsqRVyAf0uYOhGkzu/WU=
private key: (hidden)
listening port: 51820
peer: NycldbWN6NSkY4eEAEMQuH7nOCfe0HfxwuY5L7toBlQ=
endpoint: 5.6.7.8:51820
allowed ips: 10.0.0.2/32, 192.168.40.0/24
latest handshake: 1 minute, 22 seconds ago
transfer: 374.38 KiB received, 212.43 KiB sent
persistent keepalive: every 25 seconds
remote # wg
interface: wg0
public key: NycldbWN6NSkY4eEAEMQuH7nOCfe0HfxwuY5L7toBlQ=
private key: (hidden)
listening port: 51820
peer: iJxQgLqHntxoSTDcchl3bo2xsqRVyAf0uYOhGkzu/WU=
endpoint: 1.2.3.4:51820
allowed ips: 10.0.0.1/32, 192.168.30.0/24
latest handshake: 1 minute, 15 seconds ago
transfer: 83.99 KiB received, 150.52 KiB sent
persistent keepalive: every 25 seconds

Ping from helper to remote using WireGuard IP:

helper # ip a
...
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  link/ether 52:54:00:0c:86:75 brd ff:ff:ff:ff:ff:ff
  inet 192.168.30.100/24 brd 192.168.30.255 scope global dynamic noprefixroute enp1s0
...
3: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
  link/none
  inet 10.0.0.1/24 scope global wg0
helper # ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=41.3 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=48.6 ms

Ping from remote to helper using WireGuard IP:

remote # ip a
...
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
  link/ether 00:50:56:ad:86:84 brd ff:ff:ff:ff:ff:ff
  inet 192.168.40.100/24 brd 192.168.40.255 scope global noprefixroute ens192
...
8: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
  link/none
  inet 10.0.0.2/24 scope global wg0
remote # ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=44.9 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=40.0 ms

Ssh to remote from helper:

helper # ssh someuser@192.168.40.100
someuser@192.168.40.100's password:
[someuser@remote ~]$
remote # ssh anotheruser@192.168.30.100
anotheruser@192.168.30.100's password:
[anotheruser@helper ~]$

Adding the Remote Worker

Once we’re satisfied that we can communicate over the WireGuard tunnel, we can continue on with adding the remote node as part of the cluster. As a security best practice, we have turned off root login via SSH on our endpoints, so we run the Ansible playbook as root on the helper node, connect to the remote worker as a non-root user, and sudo to root. To streamline the Ansible work, we exchanged SSH keys from the helper’s root user to the remote worker’s non-root user. We followed the standard documentation on adding RHEL nodes to our OpenShift 4 cluster.

helper # ansible-playbook -i /<path>/inventory/hosts playbooks/scaleup.yml

After the playbook finished, the node didn’t immediately show up for us. We needed to approve the CSRs that were pending, which allowed it to be a full member of the cluster:

helper # oc get nodes
NAME                            STATUS   ROLES    AGE     VERSION
master1.steveslabdomain.com          Ready    master   3h34m   v1.19.0+9f84db3
master2.steveslabdomain.com          Ready    master   3h30m   v1.19.0+9f84db3
master3.steveslabdomain.com          Ready    master   3h22m   v1.19.0+9f84db3
worker1.steveslabdomain.com          Ready    worker   132m    v1.19.0+9f84db3
worker2.steveslabdomain.com          Ready    worker   132m    v1.19.0+9f84db3
remote.kevinslabdomain.com           Ready    worker   6m54s   v1.19.0+7070803
 

Conclusion

With the inclusion of WireGuard into our architecture, this made the connectivity between our main cluster and the remote node easy to accomplish. Setting up the tunnel took the most time and energy, but that primarily was the time spent learning how to use the tool. Now that we have working configurations, it has been easy to share them with peers and add even more remote workers to this cluster without much effort.

Have fun building at your network’s edge!


About the authors

Kevin Chung is a Principal Architect focused on assisting enterprise customers in design, implementation and knowledge transfer through a hands-on approach to accelerate adoption of their managed OpenShift container platform.

Read full bio