How to create and scale 6,000 virtual machines in 7 hours with Red Hat OpenShift Virtualization

In this learning path by Boaz Ben Shabat, learn how to create a large amount of virtual machines for a production infrastructure in less than a weekend’s worth of time. 

In this learning path by Boaz Ben Shabat, learn how to create a large amount of virtual machines for a production infrastructure in less than a weekend’s worth of time. 

Hour 3: Red Hat Ceph Storage setup and deployment

1 hr

After performing our key configurations for the network accommodating Red Hat® Ceph® Storage, it is time to set up and deploy it.

What will you learn?

  • How to set up and deploy Red Hat Ceph Storage for the purposes of this mass scaling exercise

What you need before starting:

  • Completed tuned profile and buffer configuration for network

Red Hat Ceph Storage deployment

Please ensure that before deployment, all hosts can communicate with each other via SSH seamlessly, without requiring manual password entry or key approval.

The following subscription needs to be enabled in order to deploy ceph:

$ subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms --enable=ansible-2.9-for-rhel-8-x86_64-rpms --enable=rhceph-5-tools-for-rhel-8-x86_64-rpms --enable rhel-8-for-x86_64-appstream-rpms
  1. Install the required packages:

    $ dnf install podman ansible cephadm-ansible cephadm -y
  2. And login to the registry:

    $ podman login -u user -p password registry.redhat.io
  3. Create a file containing a list of the hosts for the Red Hat Ceph Storage cluster:

    Host1.domain.name
    Host2.domain.name
    Host3.domain.name
  4. Now we can start the pre-deployment test:

    $ cd /usr/share/cephadm-ansible/
    $ ansible-playbook -i /path/to/hosts_file cephadm-preflight.yml --extra-vars "ceph_origin=rhcs"

    The files below will need to be copied to any other host that is a member of the Red Hat Ceph Storage cluster, make sure that the files will be copied to the same path on the other hosts (/etc/ceph/):

    /etc/ceph/ceph.client.admin.keyring 
    /etc/ceph/ceph.conf
  5. Create a file containing the registry file login details:

    {"url":"registry.redhat.io",                                                                                                                                                 
    "username": "email@user.name",                                                                                                                                           
    "password": "my-password"                         
    }     
  6. Now we can start the deployment:

    $ cephadm bootstrap --mon-ip bond_ip_from_any_cluster_member --registry-json /etc/ceph/ceph_login_details_file --allow-fqdn-hostname --yes-i-know --cluster-network bond_subnet_mask # e.g 192.168.0.0/16

Once the deployment was successfully completed we can add all the host's roles in the cluster, the simplest way to do it is through cephadm shell, here is an example:

$ cephadm shell
$ ceph orch host add Host1.domain.name bond_ip --labels=osd,mgr,dashboard
$ ceph orch host add Host2.domain.name bond_ip --labels=osd,mgr
$ ceph orch host add Host3.domain.name bond_ip --labels=osd,mon
$ ceph orch host add Host4.domain.name bond_ip --labels=osd,mon
$ ceph orch host add Host5.domain.name bond_ip --labels=osd,grafana
$ ceph orch host add Host6.domain.name bond_ip --labels=osd,prometheus
$ ceph orch host add Host7.domain.name bond_ip --labels=osd,mdss
$ ceph orch host add Host8.domain.name bond_ip --labels=osd,mdss
$ ceph orch host add Host9.domain.name bond_ip --labels=osd

Make sure to follow best practices for high availability (HA) and performance, those can vary pending on cluster HW, and the number of hosts in the cluster.

Red Hat Ceph Storage Setup

Pool Creation

This section describes the Red Hat Ceph Storage-specific tuning performed on the Red Hat Ceph Storage nodes to cater to this large-scale environment, on this specific setup, we created a single pool, for NVME and SSD disks:

$ ceph osd pool create ocp_pool

Note that for spinning disks it is highly recommended to create a separate pool in order to optimize the cluster performance, we can do that  by creating crush roles:

$ ceph osd crush rule create-replicated replicated_hdd default host hdd
$ ceph osd crush rule create-replicated replicated_ssd default host ssd 

And then applying those rules to the pools, for example:

$ ceph osd pool set hdd_pool crush_rule replicated_hdd

Or:

$ ceph osd pool set nvme_pool crush_rule replicated_ssd

Placement Groups Tuning

We can achieve the optimal number of PGs per pool by setting our target at 100PGs per OSD (According to best practice for rbd & librados), then multiply by the maximum used capacity of the pool (default is 85%), divide by the number of replicas, and round to the nearest power of 2 - 2^(round(log2(x))), for this cluster that was 8192:

$ ceph osd pool set ocp_pool pg_autoscaler_mode off
$ ceph osd pool set ocp_pool pg_num 8192

For example for a setup with 200 disks or OSD’s:
(100 * 200*0.85) /3 = rounded to a power of 2 is  4096 total  PGs.

We scripted it using bc:

$ echo "x=l(100*200*0.85/3)/l(2); scale=0; 2^((x+0.5)/1)" | bc -l

Note that we can increase the number of PGs per OSD even further - that can potentially reduce the variance in per-OSD load across the cluster, but each PG requires a bit more CPU and memory on the OSDs that are storing it, therefore the number of OSDs should be tested and tuned per environment.

Prometheus Tuning

For monitoring the Red Hat Ceph Storage cluster we can use the Red Hat Ceph Storage dashboard, to enable stats to display for the pool we can run:

$ ceph config set mgr mgr/prometheus/rbd_stats_pools ocp_pool

To lessen the load on the system for large clusters, we can throttle the pool stats collection with:

$ ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600

It's also a good idea to lower the polling rate for Prometheus in order to avoid turning the ceph manager into a bottleneck, which may result in other ceph-mgr plugins not getting time to run.

In this case, the following command sets the scraping interval to 60 seconds.

$ ceph config set mgr mgr/prometheus/scrape_interval 60
Previous resource
Hour 2 - Buffer tuning
Next resource
Hour 4-6 - OpenShift deployment

This learning path is for operations teams or system administrators
Developers may want to check out Foundations of OpenShift on developers.redhat.com.

Get started on developers.redhat.com

Hybrid Cloud Logo LinkedIn YouTube Facebook Twitter

Products

Tools

Try, buy, sell

Communicate

About Red Hat

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.