This is a guest post from Couchbase’s Sindhura Palakodety, Senior Technical Support Engineer. 

Couchbase is the first NoSQL vendor to have generally available, production-certified operator for the Red Hat OpenShift Container Platform. The Couchbase Autonomous Operator enables enterprises to more quickly adopt the Couchbase Engagement Database in production to create and modernize their applications for the microservices era, both in multi-cloud and hybrid-cloud environments. Now, developers can create apps that are designed to harness the value of Kubernetes platforms to deliver the scale, agility, and velocity that their businesses require.

Today, Red Hat and its partners introduced the Operator Hub. Operators can take advantage of Kubernetes’ extensibility to be able to deliver automation advantages of cloud services like provisioning, scaling, and backup/restore while being able to run where Kubernetes can run. The Operator Framework aims at simplifying the building and management of Operators by providing developer and runtime Kubernetes tools, thereby helping to accelerate the development of an Operator. This framework includes an Operator SDK, Operator Lifecycle Manager (that oversees installation, updating, and management of the lifecycle of all of the Operators), and Operator Metering for usage reporting.

As of this blog’s publication, the Couchbase Autonomous Operator is supported in production on OpenShift Container Platform 3.11, and is available in developer preview as part of the Operator Framework Technology Preview.

History of the Couchbase Autonomous Operator

Since September 2018, Couchbase has shipped two stable releases of the Autonomous Operator, which are already out on the market and are in use by customers on many different environments.

In its 1.0 release, the Couchbase Autonomous Operator included exciting new capabilities such as:

  • Automated cluster provisioning
  • Automated failure recovery with custom Couchbase-specific automation
  • Cross datacenter replication (XDCR)
  • On-demand dynamic scaling
  • Rack/zone awareness
  • Auto-failover capabilities in Couchbase Server 5.5.1
  • Production-grade supportability features
  • Production certification of Open Source Kubernetes and Red Hat OpenShift Container platforms
  • Persistent storage support
  • Centralized configuration management
  • Enterprise-grade high availability features

In version 1.1, released in November 2018, a number of new supportability enhancements were introduced, such as persisting Couchbase logs to storage volumes to help troubleshoot pod failures on stateless deployments. It also enabled redaction of sensitive information from logs in order to help protect user data.

We are currently working on the next version of the Couchbase Autonomous Operator (version 1.2) which includes some innovative new features:

  • Fully automated upgrade of the Couchbase cluster
  • Rolling upgrade of Kubernetes without affecting the Couchbase cluster
  • Assisted deployment via Helm charts

We’re excited to get these new features into your hands in the next few months!

Couchbase Autonomous Operator Overview

Managing stateful applications such as Couchbase Server and other databases is a challenge since it requires application domain knowledge to correctly scale, upgrade, and reconfigure, while also protecting against data loss and unavailability. We want this application-specific operational knowledge to be encoded into the software that leverages the powerful Kubernetes abstractions to help run and manage the application correctly.

The goal of the Couchbase Autonomous Operator is to fully self-manage one or more Couchbase deployments so that you don’t need to worry about the operational complexities of running Couchbase. Not only is the Couchbase Autonomous Operator designed to automatically administer the Couchbase cluster, it can also self-heal and self-manage the cluster according to Couchbase best practices.

Architecture

At a high level, the architecture of the Couchbase Autonomous Operator consists of the following components:

  1. Server pods
  2. Services
  3. Volumes

When a Couchbase cluster gets deployed, additional Kubernetes resources such as server pods, services, and volumes are created by the Couchbase Autonomous Operator to facilitate its deployment. The resources originating from the Couchbase Autonomous Operator get labeled to make it easier to list and describe the resources belonging to a specific cluster.


Why not use Kubernetes StatefulSets?

Kubernetes StatefulSets are great for certain use-cases, but they don’t work that well for running complex software like databases. That is because StatefulSets focus on creating and managing pods, not on managing the software running on them. For example, if you wanted a four-node cluster and deployed Couchbase using StatefulSets, you would get four uninitialized Couchbase pods that don’t know about each other. It would then be up to you to join the nodes together into a cluster — and that means extra operational tasks.

By deploying a unique custom Couchbase controller, Kubernetes gets Couchbase-specific knowledge so that as each Couchbase pod is deployed, it can properly configure it and join it with the other Couchbase pods in the cluster. It’s also important to keep in mind that provisioning a cluster is just one place where having a custom controller helps to automate tasks — node failure, ad-hoc scaling, and other management tasks also require Couchbase-specific knowledge within Kubernetes in order to be properly automated.

Overview of the Couchbase Autonomous Operator on the OpenShift 3.11 Operator Framework


The above diagram depicts the fundamental components in the Couchbase cluster deployment on OpenShift:

  • OpenShift 3.11 with GlusterFS persistent volume storage and underlying Kubernetes cluster
  • Couchbase CRD defining the custom resources required for the Couchbase cluster
  • Operator Framework comprising the Catalog Operator and Operator Lifecycle Manager (OLM)
  • Catalog Operator registers the Couchbase CRD with Kubernetes API Server
  • OLM installs and handles rolling updates of the Couchbase Operator
  • Once the Couchbase Operator is installed in an OpenShift project, the Couchbase cluster instances and services can be deployed by specifying the desired configuration in YAML format and passing it onto the Operator

Using Kubernetes Persistent Volumes

Kubernetes persistent volumes offer a way to create Couchbase pods with data that resides outside of the actual pods themselves. This decoupling provides a higher degree of resilience for data within the Couchbase cluster when a node goes down or if its associated pod gets terminated. Likewise, persistent volumes can provide greater flexibility and efficiency in deployments because Kubernetes can automatically move Couchbase pods between nodes without worrying about any downtime or data loss. The Couchbase Autonomous Operator supports some of the most popular Kubernetes persistent volumes, such as GlusterFS, CephRBD, AWS, Azure Disk, GCE, and Portworx.

How to Get Started with the Couchbase Autonomous Operator on the OpenShift Operator Framework

This guide is a summary of some of the content found in the Red Hat and Couchbase documentation. It provides a tutorial for using the OpenShift Operator Framework to set up Couchbase with persistent volumes.

Prerequisites

  • A Red Hat OpenShift Container Platform 3.11 environment with Technology Preview OLM enabled and GlusterFS persistent volume storage configured
  • Access to an OpenShift user that has cluster-admin privileges and access to the OpenShift Web Console
  • Couchbase Operator loaded to the Operator catalog (loaded by default with Technology Preview OLM)

Installing the Couchbase Operator

  1. Log in to the Red Hat OpenShift Web Console as a user with the cluster-admin role. Make sure you’re viewing the Cluster Console.
  2. Click Create Project. Enter a name for the project and click Create. (This example uses a project called couchbase-operator-example.)
  3. Go to Operators > Catalog Sources and find the Couchbase Operator.
  4. Click Create Subscription, and then on the following YAML page, click Create to install the Couchbase Operator to the project.
  5. After creating the subscription, the Operators > Subscriptions screen is displayed. From here, check that the Upgrade Status shows Up to date and 1 installed.

Once the Couchbase Operator shows that it’s installed, you can run oc get pods on the master node to see that the Couchbase Operator pod is running in the namespace:

Deploying the Couchbase Cluster

  1. Before creating the Couchbase cluster, create a secret with the following definition using the web console that holds credentials for the super user account. (The Couchbase Operator reads this upon startup and configures the database with these details.)
----
apiVersion: v1
kind: Secret
metadata:
 name: couchbase-admin-creds
 namespace: couchbase-operator-example
type: Opaque
stringData:
 username: admin
 password: password
<span>----</span>

To create the secret via the web console, first click Workloads > Secrets from the left navigation. Make sure that you’re in the couchbase-operator-example project, then click Create and choose Secret from YAML. Paste in the above definition YAML (or your own) and click Create.

2. Next, go to Operators > Cluster Service Versions and click on Couchbase Operator to go to the Couchbase Operator Overview screen.

3. Click the Create Couchbase Operator button to start creating the Couchbase cluster.

4. On the Create Couchbase Cluster screen, the web console displays a minimal starting CouchbaseCluster template which will create a three-node Couchbase cluster with services enabled and no persistent volumes configured. We want to replace this pre-populated template with the example configuration below, which creates a Couchbase cluster using the GlusterFS persistent volume storage for the Data, Index, and Analytics service:
----
apiVersion: couchbase.com/v1
kind: CouchbaseCluster
metadata:
 name: cb-example
 namespace: couchbase-operator-example
spec:
 authSecret: couchbase-admin-creds
 baseImage: registry.connect.redhat.com/couchbase/server
 version: 5.5.3-3
 buckets:
   - conflictResolution: seqno
     enableFlush: true
     evictionPolicy: fullEviction
     ioPriority: high
     memoryQuota: 128
     name: default
     replicas: 1
     type: couchbase
 exposeAdminConsole: true
 adminConsoleServices:
   - data
 cluster:
   analyticsServiceMemoryQuota: 1024
   autoFailoverMaxCount: 3
   autoFailoverOnDataDiskIssues: true
   autoFailoverOnDataDiskIssuesTimePeriod: 120
   autoFailoverServerGroup: false
   autoFailoverTimeout: 120
   clusterName: cb-example
   dataServiceMemoryQuota: 256
   eventingServiceMemoryQuota: 256
   indexServiceMemoryQuota: 256
   indexStorageSetting: memory_optimized
   searchServiceMemoryQuota: 256
 servers:
   - size: 3
     name: all_services
     services:
       - data
       - index
       - query
       - search
       - eventing
       - analytics
     pod:
       volumeMounts:
         default: couchbase
         data: couchbase
         index: couchbase
         analytics:
           - couchbase
           - couchbase
 securityContext:
   fsGroup: 1000180000
   runAsUser: 1000180000
   runAsNonRoot: true
 volumeClaimTemplates:
   - metadata:
       name: couchbase
     spec:
       storageClassName: "glusterfs-storage"
       resources:
         requests:
          storage: 1Gi
----

You can make custom adjustments to the above CouchbaseCluster template, but for the purposes of this example, you must pay attention to the following parameters:
namespace
Make sure this matches the name of the project. You don’t need to modify this if you’re using the example project name (couchbase-operator-example).
authSecret
This is the secret we created in Step 1. You don’t need to modify this if you’re using the same secret that is in the example (couchbase-admin-creds).
fsGroup
This is a security parameter for the persistent volumes. If you go to Administration > Namespaces > couchbase-operator-example > YAML in the OpenShift web console, you’ll see a value for openshift.io/sa.scc.supplemental-groups. If the value is 1000180000/10000, then the fsGroup must be set to a valid value between 1000180000 and 1000190000. For more information, see the fsGroup documentation.

Once you have the YAML configured, paste it into the web console and click Create. The Couchbase Operator then starts up the pods, services, and all other components of the Couchbase cluster.

Note the following:

  • The Create Couchbase Operator button in the Operator’s Overview screen is somewhat of a misnomer, as the Couchbase Operator is already created and running at this point. Clicking this button deploys the Couchbase cluster, not the Couchbase Operator.
  • When clicking the Create Couchbase Operator button, you may receive a 404 error the first time. This is a known issue; as a workaround, refresh this page to continue. (BZ#1609731)

5. Shortly, the Couchbase cluster is ready to use. Your project now contains a number of resources created and configured automatically by the Couchbase Operator:

6. Now it’s time to check that the persistent volumes were set up correctly. Go to Storage > Persistent Volume Claims from the left navigation to verify whether the GlusterFS persistent volumes are created and have bound to the Couchbase pods successfully:

 

7. The port details for accessing the Couchbase Web Console can be found in the cb-example-ui service found under Networking > Services in the left-hand nav. You’ll see a set of Couchbase ports and their corresponding Node Ports. In this example, the Couchbase ports are 8091 (non-SSL) and 18091 (SSL). You can access the Couchbase Web Console on the Node Ports, which in this example are 32562 (non-SSL) and 31311 (SSL). Therefore you would point your browser to <node_ip>:32562, where <node_ip> is the IP address of any OpenShift node that hosts the Couchbase cluster.

8. You can now connect to the Couchbase cluster using the credentials saved in the secret. Other application pods can mount and use this secret and communicate with the service.

Now let’s take a look at one of the awesome features we talked about earlier: Automatic recovery with persistent volumes.

Auto Recovery with Persistent Volumes

The Couchbase Operator is designed to always monitoring the Couchbase cluster for failures. When a node or server group failure is detected, the Couchbase Operator is designed to automatically creates a new instance either on the same host machine (preferably) or on a different host machine. It will then rebalance out the bad instances, add the new instance, and bring the cluster back up to full capacity.

If a Couchbase cluster is configured with persistent volumes, the Couchbase Operator does the following during an auto recovery event:

  • Creates a new instance and attaches it to the same persistent volume
  • Performs complex Couchbase operations such as delta-node recovery and warm-up operations, which reduces rebalancing data from all other instances (a time-consuming operation depending on the size of data)
  • Removes the faulty instance from the Couchbase cluster and replaces it with a new instance, enablingensuring that the cluster to beis back up to the desired configuration without any loss of data

To illustrate the points above, let’s look at a the Couchbase cluster that we created earlier. (Note that we’ve added 10,000 documents in the default bucket.)

Now, let’s delete the pod cb-example-0002 to see how the cluster behaves.

The Couchbase Operator notices that a node has gone down and waits for a node to get automatically failed over:

As per the CouchbaseCluster template, the Couchase Operator expects three server pods to be running at all times. In this example scenario, since the GlusterFS persistent volumes for the deleted pod cb-example-0002 already exist, the Couchase Operator brings up a new pod with the same name (cb-example-0002) and binds it to those same persistent volumes. The Couchase Operator then triggers the rebalance process to add the newly created pod into the cluster.

Notice that the bucket count remains the same after rebalance, indicating that there is no data loss.

Summary

We hope that we’ve given you a good overall taste for using the Couchbase Autonomous Operator on Red Hat OpenShift. We summarized the Couchase Operator architecture and gave walkthroughs of how to use the OpenShift Operator Framework to more easily deploy a Couchbase cluster with persistent volumes. We even demonstrated the self-healing capabilities of the Couchase Operator by deleting a pod and watching the Couchase Operator spin up a new pod in its place and rebalance it into the cluster. These are just a few of the things you can do with Couchbase on OpenShift.