Central to the functionality of many Kubernetes deployments, including every OpenShift, is etcd - the key/value store responsible for persisting the configuration of, well, everything in the cluster. When etcd is unhealthy, the cluster is unhealthy. When etcd has poor performance, the cluster has poor performance. However, etcd doesn’t have to be a mysterious and scary component!

Today, we’re joined on the stream by Anand Chandramohan, Product Manager for etcd at Red Hat, to discuss, among other things how it works, performance requirements and troubleshooting, backups, periodic maintenance and much more.

As always, please see the list below for additional links to specific topics, questions, and supporting materials for the episode!

If you’re interested in more streaming content, please subscribe to the OpenShift.tv streaming calendar to see the upcoming episode topics and to receive any schedule changes. If you have questions or topic suggestions for the OpenShift Administrator’s Office Hour, please contact us via Discord, Twitter, or come join us live, Wednesdays at 11am EST / 1600 UTC, on YouTube and Twitch.

Episode 21 recorded stream:

 

 

Supporting links for today’s topic:

  • Jump straight to the start of today’s topic with this link: https://youtu.be/uFlr9gho99o?t=645 
  • Standalone etcd documentation can be found here.
  • Etcd uses the raft protocol to authoritatively store and protect data. For more information on how it works, there’s an easy to grok page here.
  • Within OpenShift, the control plane etcd cluster is managed by the Cluster etcd Operator.
  • The performance of etcd is directly affected by storage and network latency, you can find Red Hat’s suggested requirements here.
  • If you want to check the performance of your storage and guage whether or not it’s capable of hosting an OpenShift etcd instance, use this test from IBM. If you already have a cluster deployed and running, you can check it’s performance with the same test by following this KCS.
  • The documentation also includes several recommended host practices for etcd, including some statistics to monitor, along with how and when to defragment the etcd datastore.
  • It is recommended that disk partitioning for OpenShift Container Platform be left to the installer. However, there are cases where you might want to create separate partitions in a part of the filesystem that you expect to grow.
    • /var/lib/etcd: Holds data that you might want to keep separate for purposes such as performance optimization of etcd storage.
  • How to mount separate disk to /var/lib/containers on OpenShift nodes. This can be used to mount secondary storage for /var/lib/etcd
  • We strongly recommend creating backups of the etcd data to ensure that, in the event of an error, corruption, or other data loss, you can quickly recover the cluster back to it’s last known good state.
  • By default, etcd data is not encrypted in OpenShift Container Platform. You can enable etcd encryption for your cluster to provide an additional layer of data security. For example, it can help protect the loss of sensitive data if an etcd backup is exposed to the incorrect parties.

Other questions answered during the stream: