One of our customers has configured OpenShift's log store to send a copy of various monitoring data to an external Elasticsearch cluster. Due to a problem that occurred in this customer's environment, where part of the data from its external Elasticsearch cluster was lost, it was necessary to develop a way to copy the missing data, through a backup and restore process.

There are several ways to make backups as well as direct data transfers from one Elasticsearch Cluster to another, which allows good flexibility to meet different scenarios. We will see some of them in detail in this article.

Missing Data on External Elasticsearch Cluster

Requirements

A route to internal OCP Elasticsearch. You can click here to learn about this.

The following package must be installed on one of Elasticsearch External hosts:

  • Node.js Elasticsearch Dump.

NOTE: In this article, the host selected was Kibana, as it already has NodeJS packages installed.

Assumptions

The acronym "ES" means "Elasticsearch".

For ease of understanding, assume the following URLs as a base for the example environment:

Access Token

We will need a user token with access to Elasticsearch from OpenShift.

OCP Bastion

The following procedures must be run from within the OpenShift Bastion.

Get Internal ES IP

Internally, you can access the log store service using the log store cluster IP, which you can get by using either of the following commands:

[root@bastion ~]# echo $(oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging)
172.31.58.140

Or:

[root@bastion ~]# oc get service elasticsearch -n openshift-logging
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.31.58.140   <none>        9200/TCP   15d

Expose Log Store

To expose the log store externally, first alternate to "openshift-logging" project:

[root@bastion ~]# oc project openshift-logging

Extract the CA certificate from the log store and write to the admin-ca file:

[root@bastion ~]# oc extract secret/elasticsearch --to=. --keys=admin-ca
admin-ca

Route for the Log Store Service

Create a YAML internal-es-route.yaml file with the following content:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: elasticsearch
namespace: openshift-logging
spec:
host:
to:
  kind: Service
  name: elasticsearch
tls:
  termination: reencrypt
  destinationCACertificate: |

Attention: the file should end with the "|" character, and preserve the indentation.

Now, run the following command to add the log store CA certificate to the route YAML you created in the previous step:

[root@bastion ~]# cat ./admin-ca | sed -e "s/^/      /" >> internal-es-route.yaml

Check if the file is similar to this:

[root@bastion ~]# cat internal-es-route.yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: elasticsearch
namespace: openshift-logging
spec:
host:
to:
  kind: Service
  name: elasticsearch
tls:
  termination: reencrypt
  destinationCACertificate: |
    -----BEGIN CERTIFICATE-----
    MIIFNzCCAx+gAwIBAgIUTXUBAGG84VFHUe3/o7tY5z3T9ZowDQYJKoZIhvcNAQEL
    BQAwKzEpMCcGA1UEAwwgb3BlbnNoaWZ0LWNsdXN0ZXItbG9nZ2luZy1zaWduZXIw
    HhcNMjIwMzA3MjAxMzM4WhcNMjcwMzA2MjAxMzM4WjArMSkwJwYDVQQDDCBvcGVu
    c2hpZnQtY2x1c3Rlci1sb2dnaW5nLXNpZ25lcjCCAiIwDQYJKoZIhvcNAQEBBQAD
(...)
    RJm3HFBqgu4zNf+dReKiJBZqdTaVFRJqDgRwWX7vA31S7DTadPM6VcPxm0YxqK++
    7dAEfqVkrD3bj46324AwUXCExIKvR/vRd20y1PD2gaONkDssaebfCTHi8MP17GcE
    cDGmWbKqHuSQLwCbk0ogVSwNFOdqsMOS5rYvdalIHE2l+DOFeuo6OM6/zsE/1hTD
    DZt6md8mkvXmUpK34Wtl46utmguv6fBZ6hb3O+NMOe8zOPa8GV/HU5E5Ew==
    -----END CERTIFICATE-----

Create the route:

[root@bastion ~]# oc create -f internal-es-route.yaml
route.route.openshift.io/elasticsearch created

Get the user token:

[root@bastion ~]# token=$(oc whoami -t)
[root@bastion ~]# echo $token
sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw

Tip: To use elasticsearch SA token:

$ oc sa get-token elasticsearch

Set the elasticsearch route you created as an environment variable.

[root@bastion ~]# routeES=$(oc get route elasticsearch -o jsonpath={.spec.host})
[root@bastion ~]# echo $routeES
elasticsearch-openshift-logging.apps.homelab.rhbrlabs.com

To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

[root@bastion ~]# curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}"
{
"name" : "elasticsearch-cdm-1tsq0edh-2",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "muRod1gOSlKkVPkj6QSqFA",
"version" : {
  "number" : "6.8.1",
  "build_flavor" : "oss",
  "build_type" : "zip",
  "build_hash" : "db90ff8",
  "build_date" : "2022-02-02T20:21:15.875200Z",
  "build_snapshot" : false,
  "lucene_version" : "7.7.0",
  "minimum_wire_compatibility_version" : "5.6.0",
  "minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

External ES Kibana

The following procedures must be run from within the External Elasticsearch Kibana.

Check Access to OCP's ES

From the external ES Kibana host, declare a variable containing the token:

[root@kibana ~]# token=sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw

Declare a variable containing the OCP ES Route Name:

[root@kibana ~]# routeES='elasticsearch-openshift-logging.apps.homelab.rhbrlabs.com'

Check the communication between External ES and Internal OCP's ES:

[root@kibana ~]# curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}"
{
"name" : "elasticsearch-cdm-1tsq0edh-1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "muRod1gOSlKkVPkj6QSqFA",
"version" : {
  "number" : "6.8.1",
  "build_flavor" : "oss",
  "build_type" : "zip",
  "build_hash" : "db90ff8",
  "build_date" : "2022-02-02T20:21:15.875200Z",
  "build_snapshot" : false,
  "lucene_version" : "7.7.0",
  "minimum_wire_compatibility_version" : "5.6.0",
  "minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

OK. Now you have access to OCP's internal ES. Proceed to the next part.

Install Elasticsearchdump

On the external Kibana host, use NPM to install elasticsearchdump:

[root@kibana ~]# npm install elasticdump
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
(...)
npm WARN root No description
npm WARN root No repository field.
npm WARN root No README data
npm WARN root No license field.
+ elasticdump@6.82.1
added 111 packages from 194 contributors and audited 111 packages in 13.671s
(...)
Data Copy

The following procedures must be run from within the External Elasticsearch Kibana.

Index List

Get a list of available indexes:

[root@kibana ~]# curl -k -XGET -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/_aliases?pretty"
{
".kibana_1" : {
  "aliases" : {
    ".kibana" : { }
  }
},
".kibana_92668751_admin_1" : {
  "aliases" : {
    ".kibana_92668751_admin" : { }
  }
},
".security" : {
  "aliases" : { }
},
"app-000001" : {
  "aliases" : {
    ".all" : { },
    "app" : { },
    "app-write" : {
      "is_write_index" : false
    },
    "logs.app" : { }
  }
},
"app-000002" : {
  "aliases" : {
    ".all" : { },
    "app" : { },
    "app-write" : {
      "is_write_index" : false
    },
    "logs.app" : { }
(...)

Let's copy a small portion of Application data from OCP's ES to External ES, as for example, app-000001 index, to a file.

  • We will need at least analyzer, mapping and data structure information.

Perform a Backup

In this example, we will show you how to backup data types Analyzer, Mapping and Data.

Note: Available index data types:

  • settings
  • analyzer
  • data
  • mapping
  • policy
  • alias
  • template
  • component_template
  • index_template

Analyzer Backup

[root@kibana ~]# NODE_TLS_REJECT_UNAUTHORIZED=0 /root/node_modules/elasticdump/bin/elasticdump --headers='{"authorization": "Bearer sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw"}' --input=https://${routeES}/app-000001 --output=analyzer.json --type=analyzer

If everything has been set up correctly, you will see messages similar to these:

Wed, 23 Mar 2022 19:56:35 GMT | starting dump
Wed, 23 Mar 2022 19:56:35 GMT | got 1 objects from source elasticsearch (offset: 0)
Wed, 23 Mar 2022 19:56:35 GMT | sent 1 objects to destination file, wrote 1
Wed, 23 Mar 2022 19:56:35 GMT | got 0 objects from source elasticsearch (offset: 1)
Wed, 23 Mar 2022 19:56:35 GMT | Total Writes: 1
Wed, 23 Mar 2022 19:56:35 GMT | dump complete

Mapping Backup

[root@kibana ~]# NODE_TLS_REJECT_UNAUTHORIZED=0 /root/node_modules/elasticdump/bin/elasticdump --headers='{"authorization": "Bearer sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw"}' --input=https://${routeES}/app-000001 --output=mapping.json --type=mapping

If everything has been set up correctly, you will see messages similar to these:

Wed, 23 Mar 2022 20:00:12 GMT | starting dump
Wed, 23 Mar 2022 20:00:12 GMT | got 1 objects from source elasticsearch (offset: 0)
Wed, 23 Mar 2022 20:00:12 GMT | sent 1 objects to destination file, wrote 1
Wed, 23 Mar 2022 20:00:12 GMT | got 0 objects from source elasticsearch (offset: 1)
Wed, 23 Mar 2022 20:00:12 GMT | Total Writes: 1
Wed, 23 Mar 2022 20:00:12 GMT | dump complete

Data Backup

[root@kibana ~]# NODE_TLS_REJECT_UNAUTHORIZED=0 /root/node_modules/elasticdump/bin/elasticdump  --headers='{"authorization": "Bearer sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw"}' --input=https://${routeES}/app-000001 --output=data.json --type=data

If everything has been set up correctly, you will see messages similar to these:

Wed, 23 Mar 2022 20:00:43 GMT | starting dump
Wed, 23 Mar 2022 20:00:43 GMT | got 0 objects from source elasticsearch (offset: 0)
Wed, 23 Mar 2022 20:00:43 GMT | Total Writes: 0
Wed, 23 Mar 2022 20:00:43 GMT | dump complete

Direct Copy

In this step we will do an inline data copy directly from internal OCP's ES to External ES. Let's assume we want to copy the entire app-write index to a newly deployed cluster, or we need to update missing data on the external ES Cluster.

TIP: If you need to copy specific indexes, generate a list and copy them individually. To generate a list of index names, run the following command:

# INDEXLIST=$(curl -s -k -XGET -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/_aliases?pretty" | grep app- | cut -d ":" -f1|grep -v app-write | tr -d "\"")
# echo $INDEXLIST

The app-write index has the following attributes:

  • settings
  • mappings
  • data

Attention: You will need to check your  index attributes to be able to do a proper copy.

  • In our example, all types have been included to make the process easier.

To perform a direct data copy from OCP's ES to External ES:

[root@kibana ~]# NODE_TLS_REJECT_UNAUTHORIZED=0 /root/node_modules/elasticdump/bin/elasticdump  --headers='{"authorization": "Bearer sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw"}' --input=https://${routeES}/app-write --output=http://es.rhbrlabs.com:9202/ --includeType settings,analyzer,data,mapping,policy,alias,template,component_template,index_template

You should see messages like these, indicating that the copy is working:

(...)
Thu, 24 Mar 2022 00:29:07 GMT | sent 100 objects to destination elasticsearch, wrote 100
Thu, 24 Mar 2022 00:29:07 GMT | got 100 objects from source elasticsearch (offset: 568700)
Thu, 24 Mar 2022 00:29:07 GMT | sent 100 objects to destination elasticsearch, wrote 100
Thu, 24 Mar 2022 00:29:07 GMT | got 100 objects from source elasticsearch (offset: 568800)
Thu, 24 Mar 2022 00:29:08 GMT | sent 100 objects to destination elasticsearch, wrote 100
Thu, 24 Mar 2022 00:29:08 GMT | got 100 objects from source elasticsearch (offset: 568900)
(...)

At the end of the copy, we can see the updated data on the External ES cluster.

Updated Data on External ES Cluster

  • IMPORTANT: When backing up very large indexes that take hours to store, I recommend using TMUX to prevent the SSH session from disconnecting and interrupting the copy.

TIP: To select a specific time range, filters can be used. Look at the examples below.

Example 1:

--searchBody '{"query":{"bool":{"must":[{"range":{"@timestamp":{"from":"2022-03-16","to":"2022-03-21"}}}],"filter":[{"match_all":{}}],"should":[],"must_not":[]}}}'

Example 2:

--searchBody '{"query":{"bool":{"must":[{"range":{"@timestamp":{"gte":"now-1d/d","lte":"now/d"}}}],"filter":[{"match_all":{}}],"should":[],"must_not":[]}}}'

Parallel Backups

This modality executes parallel processes in order to speed up a backup process. If not specified, the number of parallel processes will be automatically defined by the number of CPU cores available.

  • This backup mode cannot be used for an inline copy between two ES clusters.

In the following example, we will do a parallel backup of the app-infra index:

[root@kibana ~]# NODE_TLS_REJECT_UNAUTHORIZED=0 /root/node_modules/elasticdump/bin/multielasticdump  --headers='{"authorization": "Bearer sha256~0JtosvhtA7YwbTx-UPlhRknHLydgP1Iov2YIVN_duCw"}' --input=https://${routeES}/app-infra --output=/destination/backup --includeType settings,analyzer,data,mapping,policy,alias,template,component_template,index_template

Finishing

In this article, we could see how easy it is to copy the data stored in OpenShift Elasticsearch to another cluster and to perform backup dumps from the Elasticsearch database.

As a last recommendation, we suggest adopting security measures to control access to the route created for the OCP's internal ES.

To learn more about the benefits of OpenShift training, click here.

Useful Links


About the author

Andre Rocha is a Consultant at Red Hat focused on OpenStack, OpenShift, RHEL and other Red Hat products. He has been at Red Hat since 2019, previously working as DevOps and SysAdmin for private companies.

Read full bio