This blog explains how to leverage local NVMe disks that are present on the vsphere hypervisors inside of OCS. The disks will be directly forwarded to the VMs to keep the latency low. This feature is available for Tech Preview in OpenShift Container Storage 4.3.
Another deployment possibility is to use the NVMes as VMware datastore, which allows these devices to be shared with other VMs. This second option will not be discussed here.
Environment
Tested on VMware vSphere 6.7 U3 with latest patches installed as of Feb 26 2020 and local NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) drives.
OpenShift 4.3.1 running on Red Hat Enterprise Linux CoreOS 43.81.202002032142.0
"nodeInfo": {
"kernelVersion": "4.18.0-147.3.1.el8_1.x86_64",
"osImage": "Red Hat Enterprise Linux CoreOS 43.81.202002032142.0 (Ootpa)",
"containerRuntimeVersion": "cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8",
"kubeletVersion": "v1.16.2",
"kubeProxyVersion": "v1.16.2",
"operatingSystem": "linux",
"architecture": "amd64"
},
quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3
Preparing the disks
The NVMe disks must not be used for anything else.
When checking on the disks, they should appear like in Figure 1 - as attached, but “Not consumed”

Figure 1: NVMe disk is attached, but not consumed
Click on the available NVMe drive and note the multipath path. In my example it states:
Path Selection Policy Fixed (VMware) - Preferred Path (vmhba2:C0:T0:L0)
Now make sure your SSH service is started. In the host configuration screen, go to System → Services. Find the “SSH” service in the list and make sure it is in state Running

Figure 2: The SSH service needs to be running
Connect to the vSphere host via SSH. Use the root user and the password you set during the installation. Once connected, execute:
# lspci | grep NVMe
0000:af:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba1]
0000:b0:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba2]
0000:b1:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba3]
0000:b2:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba4]
0000:d8:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba5]
0000:d9:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND] 1.6TB 2.5" U.2 (P4600) [vmhba6]
Identify the disk you noted earlier (in our case vmhba2) and note the PCI location (the first block on the line - in this case 0000:b0:00.0)
Go back to the vCenter UI and still in the Host configuration, scroll to “Hardware” → “PCI Devices”. In that view, click on “Configure Passthrough”. In that list you will find the disk you noted earlier by PCI path.

Figure 3: Configure PCI passthrough for NVMe disk
Afterwards, you will see this new disk listed as Available (pending) for the passthrough and it will trigger you to restart the hypervisor. Please reboot the hypervisor.

Figure 4: NVMe has been added to passthrough devices, but the hypervisor has not been rebooted yet

Figure 5: NVMe is Available after Hypervisor reboot
After the hypervisor has been rebooted, the NVMe disk should be available, just like in Figure 5.
Adding the disks to the VM
Now that the NVMe disk is prepared, we have to add it to the VM. For this, the VM has to be powered down. Once the VM is off, open the VM settings and add these two items:
- Add a NVMe controller
- This is optional, but should speed up storage requests in the VM
- Add a PCI device
- Note that the VM needs to be scheduled on the host where your PCI device is present
- Add a Hard Disk (Default of 16GB capacity is good)
- This will be used for the Ceph Monitor filesystem
Afterwards, your VM settings should look similar to Figure 6.

Figure 6: VM settings after NVMe has been added
Extend the New PCI device (just like in Figure 6) and click the “Reserve all memory” button. Close the settings and power on the VM.
You can verify that the NVMe has been successfully added by running lsblk on the VM:
[core@compute-2 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 60G 0 disk
|-sda1 8:1 0 384M 0 part /boot
|-sda2 8:2 0 127M 0 part /boot/efi
|-sda3 8:3 0 1M 0 part
`-sda4 8:4 0 59.5G 0 part
`-coreos-luks-root-nocrypt 253:0 0 59.5G 0 dm /sysroot
sdb 8:16 0 16G 0 disk
nvme0n1 259:0 0 1.5T 0 disk
Creating PVs with the local disks
To create PVs that we can eventually use with OCS, we will use the local-storage operator.
➜ website git:(master) ✗ oc get no
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 44h v1.16.2
compute-1 Ready worker 44h v1.16.2
compute-2 Ready worker 44h v1.16.2
compute-3 Ready worker 44h v1.16.2
compute-4 Ready worker 44h v1.16.2
compute-5 Ready worker 44h v1.16.2
control-plane-0 Ready master 44h v1.16.2
control-plane-1 Ready master 44h v1.16.2
control-plane-2 Ready master 44h v1.16.2
Apply the necessary labels to the nodes that will later be used by OCS
➜ website git:(master) ✗ oc label node compute-0 topology.rook.io/rack=rack0
node/compute-0 labeled
➜ website git:(master) ✗ oc label node compute-1 topology.rook.io/rack=rack1
node/compute-1 labeled
➜ website git:(master) ✗ oc label node compute-2 topology.rook.io/rack=rack2
node/compute-2 labeled
➜ website git:(master) ✗ oc label node compute-0 "cluster.ocs.openshift.io/openshift-storage="
node/compute-0 labeled
➜ website git:(master) ✗ oc label node compute-1 "cluster.ocs.openshift.io/openshift-storage="
node/compute-1 labeled
➜ website git:(master) ✗ oc label node compute-2 "cluster.ocs.openshift.io/openshift-storage="
node/compute-2 labeled
Verify that the node labels have been applied as expected
➜ website git:(master) ✗ oc get node -l topology.rook.io/rack
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 44h v1.16.2
compute-1 Ready worker 44h v1.16.2
compute-2 Ready worker 44h v1.16.2
➜ website git:(master) ✗ oc get node -l cluster.ocs.openshift.io/openshift-storage
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 44h v1.16.2
compute-1 Ready worker 44h v1.16.2
compute-2 Ready worker 44h v1.16.2
➜ website git:(master) ✗ oc new-project local-storage
Now using project "local-storage" on server [...]
Now go to the OpenShift web UI and install the “local-storage” operator from OperatorHub. Make sure to select the “local-storage” namespace as the install target.
➜ website git:(master) ✗ oc get po
NAME READY STATUS RESTARTS AGE
local-storage-operator-77f887bfd9-t9lx7 1/1 Running 0 4m43s
Verify which disks names are used on your machines
➜ website git:(master) ✗ for i in $(seq 4 6); do ssh core@10.70.56.9$i lsblk; done
Warning: Permanently added '10.70.56.94' (ECDSA) to the list of known hosts.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 60G 0 disk
|-sda1 8:1 0 384M 0 part /boot
|-sda2 8:2 0 127M 0 part /boot/efi
|-sda3 8:3 0 1M 0 part
`-sda4 8:4 0 59.5G 0 part
`-coreos-luks-root-nocrypt 253:0 0 59.5G 0 dm /sysroot
sdb 8:16 0 16G 0 disk
nvme1n1 259:0 0 1.5T 0 disk
Warning: Permanently added '10.70.56.95' (ECDSA) to the list of known hosts.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 60G 0 disk
|-sda1 8:1 0 384M 0 part /boot
|-sda2 8:2 0 127M 0 part /boot/efi
|-sda3 8:3 0 1M 0 part
`-sda4 8:4 0 59.5G 0 part
`-coreos-luks-root-nocrypt 253:0 0 59.5G 0 dm /sysroot
sdb 8:16 0 16G 0 disk
nvme0n1 259:0 0 1.5T 0 disk
Warning: Permanently added '10.70.56.96' (ECDSA) to the list of known hosts.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 60G 0 disk
|-sda1 8:1 0 384M 0 part /boot
|-sda2 8:2 0 127M 0 part /boot/efi
|-sda3 8:3 0 1M 0 part
`-sda4 8:4 0 59.5G 0 part
`-coreos-luks-root-nocrypt 253:0 0 59.5G 0 dm /sysroot
sdb 8:16 0 16G 0 disk
nvme0n1 259:0 0 1.5T 0 disk
Now create the LocalVolume entities that will create PVs for the local disks
NOTE: In our example we had two different names for the NVMe disks, that’s why we had to list them both in the local-block entry.
Make sure to adjust the devicePaths in both LocalVolume instances as necessary. The local-block LocalVolume should target your NVMe drives, the local-fs should target the 16GB Harddrive.
➜ website git:(master) ✗ cat <<EOF | oc create -n local-storage -f -
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: local-block
namespace: local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage
operator: Exists
storageClassDevices:
- storageClassName: local-block
volumeMode: Block
devicePaths:
- /dev/nvme0n1
- /dev/nvme1n1
EOF
localvolume.local.storage.openshift.io/local-block created
➜ website git:(master) ✗ cat <<EOF | oc create -n local-storage -f -
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: local-fs
namespace: local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage
operator: Exists
storageClassDevices:
- storageClassName: local-fs
fsType: xfs
volumeMode: Filesystem
devicePaths:
- /dev/sdb
EOF
localvolume.local.storage.openshift.io/local-fs created
Now verify that the DS,PV and Pods exist and look similar to the output below
➜ website git:(master) ✗ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
local-block-local-diskmaker 3 3 3 3 3 <none> 4m49s
local-block-local-provisioner 3 3 3 3 3 <none> 4m49s
local-fs-local-diskmaker 3 3 3 3 3 <none> 16s
local-fs-local-provisioner 3 3 3 3 3 <none> 16s
➜ website git:(master) ✗ oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-3db3ff28 1490Gi RWO Delete Available local-block 3m28s
local-pv-53f7eacf 16Gi RWO Delete Available local-fs 6s
local-pv-6f7637fd 1490Gi RWO Delete Available local-block 4m1s
local-pv-88d91069 16Gi RWO Delete Available local-fs 6s
local-pv-c279a4a2 16Gi RWO Delete Available local-fs 6s
local-pv-cdfce476 1490Gi RWO Delete Available local-block 3m54s
➜ website git:(master) ✗ oc get po
NAME READY STATUS RESTARTS AGE
local-block-local-diskmaker-2fjjc 1/1 Running 0 5m19s
local-block-local-diskmaker-fm2xj 1/1 Running 0 5m19s
local-block-local-diskmaker-qj2t4 1/1 Running 0 5m20s
local-block-local-provisioner-k5mlj 1/1 Running 0 5m20s
local-block-local-provisioner-pvgm2 1/1 Running 0 5m20s
local-block-local-provisioner-t6bwp 1/1 Running 0 5m20s
local-fs-local-diskmaker-jxdbk 1/1 Running 0 47s
local-fs-local-diskmaker-rwmmv 1/1 Running 0 47s
local-fs-local-diskmaker-z4lh4 1/1 Running 0 47s
local-fs-local-provisioner-9w4jg 1/1 Running 0 47s
local-fs-local-provisioner-kkqxq 1/1 Running 0 47s
local-fs-local-provisioner-xsn6v 1/1 Running 0 47s
local-storage-operator-77f887bfd9-t9lx7 1/1 Running 0 11m
Setting up OCS to use the local disks
Create the OCS namespace
➜ website git:(master) ✗ cat << EOF | oc create -f -
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/cluster-monitoring: "true"
name: openshift-storage
spec: {}
EOF
namespace/openshift-storage created
Now go to the OpenShift web UI and install OCS through OperatorHub. Make sure to select the openshift-storage namespace as the install target.
After OCS is Successfully installed, create the StorageCluster like below:
➜ website git:(master) ✗ cat << EOF | oc create -f -
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
namespace: openshift-storage
name: ocs-storagecluster
spec:
manageNodes: false
monPVCTemplate:
spec:
storageClassName: local-fs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
resources:
mon:
requests: {}
limits: {}
mds:
requests: {}
limits: {}
rgw:
requests: {}
limits: {}
mgr:
requests: {}
limits: {}
noobaa-core:
requests: {}
limits: {}
noobaa-db:
requests: {}
limits: {}
storageDeviceSets:
- name: deviceset-a
count: 3
resources:
requests: {}
limits: {}
placement: {}
dataPVCTemplate:
spec:
storageClassName: local-block
accessModes:
- ReadWriteOnce
volumeMode: Block
resources:
requests:
storage: 500Gi
portable: false
EOF
Now wait for the OCS cluster to initialise. You can watch the installation with
watch oc get po -n openshift-storage
Q&A
Why use VMDirectPath I/O and not RDM?
Direct attached block devices cannot be used for RDM as stated in this VMware document.
How can I ensure the disks are clean before I use them with OCS?
You can run sudo sgdisk --zap-all /dev/nvmeXnX inside of your VMs before using them. If you have already installed OCS and you are failing in the OSD prepare job you can safely run this and the prepare jobs will retry.
Additional Resources
OpenShift Container Storage: openshift.com/storage
OpenShift | Storage YouTube Playlist
OpenShift Commons ‘All Things Data’ YouTube Playlist
Feedback
To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.
If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.3 features, take this brief 3-minute survey.
Categories