Introduction

When it comes to monitoring workloads we all think of the same trio: logs, metrics and alerts. For most applications this is everything you need, if used properly. For some however, there is a missing piece, and that is core dumps.

While there is good support and plenty of applications and techniques to deal with common monitoring, there is nothing widespread for core dumps and most approaches end up being tailored/one-off specific solutions. So, what is a core dump and why is it useful?

According to the manual, this is the official definition:
The default action of certain signals is to cause a process to terminate and produce a core dump file, a file containing an image of the process's memory at the time of termination. This image can be used in a debugger to inspect the state of the program at the time that it terminated.

Any program getting an unhandled signal with a core disposition (e.g. SIGABRT, SIGSEGV) will be terminated and the kernel will start the core dump procedure, which consists of taking a snapshot of the process memory and writing it to a file. After this procedure is complete, the process finally dies with an error.
What makes a core dump so special is precisely having access to the memory map, which contains the state of the program at the moment it crashed. Using a debugger (typically gdb) we are able to open the core dump alongside the program binary, allowing full fledged inspection of the issue that terminated the process.
If your application was killed because of one of these signals, metrics and logging might not be enough and we need to resort to core dumps, as they contain fine grained information containing the ultimate reason for failure. Keep in mind that core dumps might pose security risks (full memory map of the process, potentially including sensitive data) or incur resource contention problems when having high load in the system.

Core dump configuration

If you are already familiar with configuring core dumps you can skip this section.

Core dumps are configured at OS level, and we can control where, how and if cores are generated using a simple set of commands and configuration values. By default, core dumps are disabled because files take up space(remember they are memory dumps, so they are as big as the memory space from a process crashing) and more importantly, may contain sensitive data.

There are two ways to configure core dumps: using systemd or manually doing so by configuring kernel parameters.

When using non-systemd core dumps we can configure them by using ulimit. This is the equivalent of /etc/security/limits.conf. In this file we can find a long list of parameters, each of them able to limit one different resource of the system. For each of these parameters we have two limits: hard and soft.
Soft limits are kernel enforced values for a corresponding resource. Any user may modify the soft limit for its own session, and this type of limit can not exceed the hard limit.
Hard limits are physical limits for a corresponding resource. It also acts as the maximum value for a soft limit and is only manageable by root users.
Here we can see a simple example of listing all limits and then changing the soft limit for core dumps:

# list all limits. These default to soft limits.
$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62832
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 62832
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

# Limit a core dump file size to 1024 blocks. Since this
# is running in a shell and defaults to the soft limit, it
# only changes the limit for this particular session.
$ ulimit -c
unlimited
$ ulimit -c 1024
$ ulimit -c
1024

There are two additional resources to configure core dumps: suid and core pattern. These may be changed by using sysctl.

The first one, which is called fs.suid_dumpable drives whether a privileged process (having the setuid bit) is able to dump a core ignoring the rest of the config. Its possible values are:

  • 0: Disabled.
  • 1: Enabled.
  • 2: Enabled with restrictions. Only the process owner is able to read the core (i.e. root).

The second one, called kernel.core_pattern drives where the coredump will be written. It typically looks like a path with some special variables (using % symbols for pid, time, binary name, etc.). It can also hold a pipe (|), meaning the core dump will be passed as stdin to the process written right after the | symbol. This method ignores the core size limit we saw before. For example, this is a regular fedora system:

$ cat /proc/sys/kernel/core_pattern 
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

Any coredump will be piped to the process /usr/lib/systemd/systemd-coredump and uses parameters:

  • %P: PID of the dumped process, as seen from the initial PID namespace.
  • %u: Numeric UID of the dumped process.
  • %g: Numeric GID of the dumped process.
  • %s: Signal causing the dump.
  • %t: Time of dump as seconds from the epoch.
  • %c: Core file soft resource limit of dumped process.
  • %h: Hostname.

You can see the full list of parameters in the manual.

The third one is kernel.core_pipe_limit, and it drives how many crashing processes can be piped to a user space program when dumping cores. When a process crashes, the /proc/{PID} directory may include relevant info that we want to retrieve. To do this safely the kernel must wait for the program collecting the dump to exit (that is, the one after the pipe) so as to not prematurely remove the directory. A misbehaving collecting process could hang on forever and block the removal of the crashed process.
If the limit is exceeded, the crashing process is noted in the kernel log and the core is skipped. A value of 0 means an infinite number of processes may be captured in parallel.

When using systemd we use a service called systemd-coredump, which holds the configuration at /usr/lib/sysctl.d/50-coredump.conf. If you inspect the file you will see all of the above values and configuration settings within the file.

Core dumps in OpenShift

We have seen how to set up core dumps in a RHEL system, but what happens when we run on OpenShift? Core dumps will still be produced in each host where the pods are scheduled, but this poses several challenges:

  • Pods may run on any node. If there is a crash, how to retrieve it?
  • We can have multiple nodes with crashing processes, how to aggregate cores? And where?
  • Each node may have a different core dump configuration.
  • OpenShift uses RHCOS, which means sysctl changes won’t survive a reboot as they are not persisted.

All these questions have the same answer: having a core dump handler taking care of all those tasks. To have a common handling of core dumps we need to do the following steps:

  • Run an agent on all nodes from which we want to be able to retrieve cores.
  • Have the agent configure the core dump generation homogeneously on all nodes. This requires privileged access.
  • Backup the original configuration in case the handler is uninstalled. This should be restored upon removal.
  • Set watches for core dumps in the target directory where they will be written.
  • Send core dumps to persistent storage accessible outside of the cluster.

There is a component doing exactly these steps for us and that is the core dump handler from IBM. Core dump handler by IBM is an open source operator not maintained by Red hat. For any questions or problems related to this operator, please reach out to the core dump handler community.

  • Runs as a daemonset, automatic handling of node updates in the cluster.
  • Runs as root. It needs privileged access to be able to mutate node configuration. However, it performs backups to restore the system when uninstalled.
  • Works with S3 compatible storage. Optional plugins may be added to handle different storage solutions. For more information check this entry in the FAQ.
  • Plenty of configuration options to customize behavior.
  • Does not work with systemd core dump configurations, it operates on kernel parameters.

If you need more details you can check the README, as it is well documented in both architecture and usage.

This is a layer on top of the previous configuration we have seen, it is there only to make core dump retrieval accessible and manageable. Let's see a simple example on how to do all this in an actual OpenShift cluster.

Working example

A sample application

For the purpose of the example we are going to be using an already available application that produces a core dump some time after starting. One possible way of producing a core dump is trying to write to an invalid memory address, which will produce a segmentation fault signal when unhandled by the process, which triggers a core dump.

Setting up Cloud Object Storage

For Cloud Object Storage we are going to use Amazon’s S3 (Simple Storage Service), which provides object storage through a web service interface.
Assuming you already have a working AWS account, you need to set up the basic IAM access key and secret. If not done, you can follow these instructions.
Remember to store your credentials somewhere you can access, you will need them later.

Now we can go over to S3 to configure a bucket, if not done yet. To set up a new S3 bucket you may follow these instructions.

An important note, core dumps typically contain user info, so they are sensitive to breaches and leaking private data. To harden your core dumps you can and should set up bucket encryption when creating it. Any and all of the data accessible by any process running on any node on your cluster can be part of a core dump - securing access to and storage of the core dumps is essential to maintaining the security of the cluster and all of the applications and data accessible via the cluster!

From here you will need the region and the bucket's name for later steps, on top of the IAM access key and secret.

The cluster

For this example we are going to use an OpenShift 4.10.9 cluster in AWS.

$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-129-82.eu-west-3.compute.internal Ready worker 4h31m v1.23.5+9ce5071
ip-10-0-139-43.eu-west-3.compute.internal Ready master 4h41m v1.23.5+9ce5071
ip-10-0-161-223.eu-west-3.compute.internal Ready master 4h41m v1.23.5+9ce5071
ip-10-0-190-212.eu-west-3.compute.internal Ready worker 4h35m v1.23.5+9ce5071
ip-10-0-196-149.eu-west-3.compute.internal Ready master 4h40m v1.23.5+9ce5071
ip-10-0-222-0.eu-west-3.compute.internal Ready worker 4h31m v1.23.5+9ce5071

Let’s check the current configuration for core dumps.

$ oc debug node/ip-10-0-129-82.eu-west-3.compute.internal 
Starting pod/ip-10-0-129-82eu-west-3computeinternal-debug ...
To use host binaries, run chroot /host
Pod IP: 10.0.129.82
If you don't see a command prompt, try pressing enter.
sh-4.4# cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
sh-4.4# cat /proc/sys/kernel/core_pipe_limit
0
sh-4.4# cat /proc/sys/fs/suid_dumpable
0

We see all the defaults. These apply to every node in the cluster.

Installing the handler

Since we are using core-dump-handler we are going to take advantage of the helm chart that is readily available for use. First thing we need to do is configure the repository:

$ helm repo add core-dump-handler https://ibm.github.io/core-dump-handler/
"core-dump-handler" has been added to your repositories

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "core-dump-handler" chart repository
Update Complete. ⎈Happy Helming!

Now we can either download the chart or proceed to install. Before installing we should take a look at the different values we have in the chart:

$ helm fetch core-dump-handler/core-dump-handler
$ tar xf core-dump-handler-v8.3.1.tgz
$ ls -l core-dump-handler
total 56
-rw-r--r--. 1 user group 1134 Apr 16 15:46 Chart.yaml
drwxrwxr-x. 1 user group 192 Apr 18 10:34 ci
-rw-r--r--. 1 user group 10576 Apr 16 15:46 README.md
drwxrwxr-x. 1 user group 420 Apr 18 10:34 templates
-rw-r--r--. 1 user group 145 Apr 16 15:46 values.aws.yaml
-rw-r--r--. 1 user group 135 Apr 16 15:46 values.do.yaml
-rw-r--r--. 1 user group 184 Apr 16 15:46 values.gke-cos.yaml
-rw-r--r--. 1 user group 65 Apr 16 15:46 values.ibm.yaml
-rw-r--r--. 1 user group 513 Apr 16 15:46 values.openshift.yaml
-rw-r--r--. 1 user group 471 Apr 16 15:46 values.roks.yaml
-rw-r--r--. 1 user group 8590 Apr 16 15:46 values.schema.json
-rw-r--r--. 1 user group 1721 Apr 16 15:46 values.yaml

We see a set of values.yaml files, each of them having the specifics for each platform. Feel free to inspect them, even though we will only focus on values.yaml for specific parameters in OpenShift.
There is a schema for the general values file, stating which values are required and their formats.

For a detailed description of all parameters we can customize please refer to the documentation.

A summary of the variables we need follows:

AWS_ACCESS_KEY=<your AWS access key from IAM>
AWS_SECRET=<our AWS secret from IAM>
S3_BUCKET_NAME=<S3 bucket name. Do not include arn>
S3_REGION=<S3 region>

The default approach for your secrets is to provide them in the helm chart values. These get created if manageStoreSecret variable is set to true. If, however, you want to integrate an external secret management system you can check how to do it here. For the purpose of this example we will assume manageStoreSecret is set to true.

After setting up S3 we can proceed to install the handler:

# Example from fetched chart. To install without fetching use
# “core-dump-handler/core-dump-handler”, which maps to
# <helm repo name>/<helm chart name>
$ helm install core-dump-handler ./core-dump-handler \
--create-namespace \
--namespace observe \
--set daemonset.s3AccessKey=$AWS_ACCESS_KEY \
--set daemonset.s3Secret=$AWS_SECRET \
--set daemonset.s3BucketName=$S3_BUCKET_NAME \
--set daemonset.s3Region=$S3_BUCKET_REGION \
--set composer.crioImageCmd=images \
--set scc.create=true
NAME: core-dump-handler
LAST DEPLOYED: Fri Apr 22 13:55:33 2022
NAMESPACE: observe
STATUS: deployed
REVISION: 1
NOTES:
Verifying the chart

Run a crashing container - this container writes a value to a null pointer

1. kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

2. Validate the core dump has been uploaded to your object store instance.

Let’s have a look at what changed in the nodes. There is a pod per node to change the configuration, as can be seen in the logs and the nodes themselves.

$ oc get daemonset -n observe
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
core-dump-handler 3 3 3 3 3 <none> 74s

$ oc get pod -n observe -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
core-dump-handler-fdnn5 1/1 Running 0 80s 10.131.0.35 ip-10-0-190-212.eu-west-3.compute.internal <none> <none>
core-dump-handler-p57hb 1/1 Running 0 80s 10.128.2.12 ip-10-0-222-0.eu-west-3.compute.internal <none> <none>
core-dump-handler-wxw97 1/1 Running 0 80s 10.129.2.13 ip-10-0-129-82.eu-west-3.compute.internal <none> <none>

$ oc debug node/ip-10-0-129-82.eu-west-3.compute.internal
Starting pod/ip-10-0-129-82eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.129.82
If you don't see a command prompt, try pressing enter.
sh-4.4# cat /proc/sys/kernel/core_pattern
|/var/mnt/core-dump-handler/cdc -c=%c -e=%e -p=%p -s=%s -t=%t -d=/var/mnt/core-dump-handler/cores -h=%h -E=%E
sh-4.4# cat /proc/sys/kernel/core_pipe_limit
128
sh-4.4# cat /proc/sys/fs/suid_dumpable
2

$ oc logs -n observe core-dump-handler-wxw97
[2022-04-22T11:55:39Z INFO core_dump_agent] no .env file found
That's ok if running in kubernetes
[2022-04-22T11:55:39Z INFO core_dump_agent] Setting host location to: /var/mnt/core-dump-handler
[2022-04-22T11:55:39Z INFO core_dump_agent] Current Directory for setup is /app
[2022-04-22T11:55:39Z INFO core_dump_agent] Copying the composer from ./vendor/default/cdc to /var/mnt/core-dump-handler/cdc
[2022-04-22T11:55:39Z INFO core_dump_agent] Starting sysctl for kernel.core_pattern /var/mnt/core-dump-handler/core_pattern.bak with |/var/mnt/core-dump-handler/cdc -c=%c -e=%e -p=%p -s=%s -t=%t -d=/var/mnt/core-dump-handler/cores -h=%h -E=%E
[2022-04-22T11:55:39Z INFO core_dump_agent] Getting sysctl for kernel.core_pattern
[2022-04-22T11:55:39Z INFO core_dump_agent] Created Backup of /var/mnt/core-dump-handler/core_pattern.bak
kernel.core_pattern = |/var/mnt/core-dump-handler/cdc -c=%c -e=%e -p=%p -s=%s -t=%t -d=/var/mnt/core-dump-handler/cores -h=%h -E=%E
[2022-04-22T11:55:39Z INFO core_dump_agent] Starting sysctl for kernel.core_pipe_limit /var/mnt/core-dump-handler/core_pipe_limit.bak with 128
[2022-04-22T11:55:39Z INFO core_dump_agent] Getting sysctl for kernel.core_pipe_limit
[2022-04-22T11:55:39Z INFO core_dump_agent] Created Backup of /var/mnt/core-dump-handler/core_pipe_limit.bak
kernel.core_pipe_limit = 128
[2022-04-22T11:55:39Z INFO core_dump_agent] Starting sysctl for fs.suid_dumpable /var/mnt/core-dump-handler/suid_dumpable.bak with 2
[2022-04-22T11:55:39Z INFO core_dump_agent] Getting sysctl for fs.suid_dumpable
[2022-04-22T11:55:39Z INFO core_dump_agent] Created Backup of /var/mnt/core-dump-handler/suid_dumpable.bak
fs.suid_dumpable = 2
[2022-04-22T11:55:39Z INFO core_dump_agent] Creating /var/mnt/core-dump-handler/.env file with LOG_LEVEL=Warn
[2022-04-22T11:55:39Z INFO core_dump_agent] Writing composer .env
LOG_LEVEL=Warn
IGNORE_CRIO=false
CRIO_IMAGE_CMD=images
USE_CRIO_CONF=false
FILENAME_TEMPLATE={uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}
LOG_LENGTH=500

[2022-04-22T11:55:39Z INFO core_dump_agent] Executing Agent with location : /var/mnt/core-dump-handler/cores
[2022-04-22T11:55:39Z INFO core_dump_agent] Dir Content []
[2022-04-22T11:55:39Z INFO core_dump_agent] INotify Starting...
[2022-04-22T11:55:39Z INFO core_dump_agent] INotify Initialised...
[2022-04-22T11:55:39Z INFO core_dump_agent] INotify watching : /var/mnt/core-dump-handler/cores

Here we can see all the changes the handler has performed in each of the workers. It creates a backup for all the parameters it replaced: core_pattern, core_pipe_limit and suid_dumpable. With this backup it is able to restore the original values when uninstalled.
Note the core_pattern, it was set to a pipe value, meaning there is no need to change the limit for core size, as we saw earlier.

Uninstalling the handler

Uninstalling the handler is as easy as running one command:

$ helm delete  core-dump-handler -n observe
release "core-dump-handler" uninstalled

We saw before that the handler stores a backup file for the old core_pattern and some configuration values to restore them if deleted. This is the occasion, so we should now see the configuration as it was before installing:

$ oc debug node/ip-10-0-251-195.us-west-2.compute.internal
Starting pod/ip-10-0-251-195us-west-2computeinternal-debug ...
To use host binaries, run chroot /host
Pod IP: 10.0.251.195
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

Dumping cores

Now that everything is set up we may proceed to trigger core dumps and see what happens. As advised by the handler, we will use the same command and application that is suggested:

$ oc run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never -n observe
Logging a message 1 from segfaulter
Logging a message 2 from segfaulter
Logging a message 3 from segfaulter
Logging a message 4 from segfaulter
Logging a message 5 from segfaulter
Logging a message 6 from segfaulter
Logging a message 7 from segfaulter
Logging a message 8 from segfaulter
Logging a message 9 from segfaulter
...
Logging a message 998 from segfaulter
Logging a message 999 from segfaulter
pod observe/segfaulter terminated (Error)

The pod is now terminated, as we can see with a quick get:

$ oc get pod -n observe segfaulter
NAME READY STATUS RESTARTS AGE
segfaulter 0/1 Error 0 35s

And checking the status we can confirm the faulty exit code:

$ oc get pod -n observe segfaulter -o jsonpath='{.status.containerStatuses[*].state}' | jq
{
"terminated": {
"containerID": "cri-o://0cc97cff4433372a5768ffc1fd1689f63e11f2fc9429b17e22087046c621eed7",
"exitCode": 132,
"finishedAt": "2022-04-22T12:06:01Z",
"reason": "Error",
"startedAt": "2022-04-22T12:06:00Z"
}
}

A core dump has been generated, therefore we may now look into the logs of the handler:

$ oc logs -n observe core-dump-handler-wxw97
...
[2022-04-22T12:06:00Z INFO core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
[2022-04-22T12:06:00Z INFO core_dump_agent] zip size is 29774
[2022-04-22T12:06:01Z INFO core_dump_agent] S3 Returned: 200

We see the handler grabbed the file from the host path /var/mnt/core-dump-handler/cores/26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip and sent it to S3 storage. We can also check by running:

$ aws s3 ls $S3_BUCKET_NAME
2022-04-22 14:06:02 29774 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip

Dumping system cores

We know now that the core_pattern is configured at kernel level and it spans all core dumps within a host, so any process dumping a core will follow the same configuration. This means that not only our application will see its core dumps written to S3, but any process running in the same host. This includes our application, other workloads, OCP and Kubernetes components, and all programs running in the cluster's nodes.
To prove this, let's try to make a regular program, outside OpenShift, to dump a core.

First we get a shell in a node:

$ oc debug node/ip-10-0-251-195.us-west-2.compute.internal
Starting pod/ip-10-0-251-195us-west-2computeinternal-debug ...
To use host binaries, run chroot /host
Pod IP: 10.0.251.195
If you don't see a command prompt, try pressing enter.
sh-4.4#

Now we run an example program:

sh-4.4# sleep 3600 &
[1] 272564

And now we force the application to dump a core by artificially sending an abort signal to the process:

sh-4.4# kill -ABRT 272564
sh-4.4#
[1]+ Aborted (core dumped) sleep 3600

And there we have it, a core was dumped and we should see it both in the handler's logs and S3:

$ oc logs -n observe core-dump-handler-4667h
...
[2022-04-22T12:11:57Z INFO core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/e1ce523e-75e1-4d88-87c6-5014b33dacbc-dump-1650629517-ip-10-0-129-82-sleep-272564-6.zip
[2022-04-22T12:11:57Z INFO core_dump_agent] zip size is 25893
[2022-04-22T12:11:57Z INFO core_dump_agent] S3 Returned: 200


$ aws s3 ls $S3_BUCKET_NAME
2022-04-22 14:06:02 29774 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
2022-04-22 14:11:58 25893 e1ce523e-75e1-4d88-87c6-5014b33dacbc-dump-1650629517-ip-10-0-129-82-sleep-272564-6.zip

Retrieving cores

Using S3 storage

We should now have the core dump in the storage, let’s get it and inspect the contents:

$ aws s3 cp s3://$S3_BUCKET_NAME/26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
download: s3://pacevedo-test/26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip to ./26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip

$ ls -l 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
-rw-rw-r--. 1 pacevedo pacevedo 29774 Apr 22 14:06 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip

$ unzip 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
Archive: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.zip
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-dump-info.json
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.core
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-pod-info.json
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-runtime-info.json
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-ps-info.json
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-0.log
inflating: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-0-image-info.json

Before we continue let’s have a look at the files in the zip. We can see the following (excluding prefix):

  • dump-info.json: Contains information about when and where the core was dumped: binary name within the container, container hostname, pid, signal causing the core dump, timestamp and node hostname. core: The actual core dump, you can open it with a debugger.
  • pod-info.json: Includes information about the pod in which the container was running. Includes annotations, labels and some metadata.
  • runtime-info.json: Includes information about the container runtime variables and environment.
  • ps-info.json: Includes information about labels and annotations for each container in the pod, among other metadata.
  • log: Crashing container logs.
  • image-info.json: Information about the container image.

For example, we can see in the dump-info.json which node generated the core:

$ cat 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4-dump-info.json | jq -r '.node_hostname'
ip-10-0-129-82

$ oc debug node/ip-10-0-129-82.eu-west-3.compute.internal -- hostname
Starting pod/ip-10-0-129-82eu-west-3computeinternal-debug ...
To use host binaries, run chroot /host
ip-10-0-129-82

For the sake of the example let’s open the coredump to see how it’s done. We already have the core, now we need the binary from the application, but we have the container.

# Image from the image-info.json
$ podman pull quay.io/icdh/segfaulter@sha256:6afa4cc864ac2249d2fd981626a54f4a8fb6ca9b088eb17b8b5d540cb3f2296b
Trying to pull quay.io/icdh/segfaulter@sha256:6afa4cc864ac2249d2fd981626a54f4a8fb6ca9b088eb17b8b5d540cb3f2296b...
Getting image source signatures
Copying blob 97518928ae5f done
Copying blob 81e74cbd6df2 done
Copying blob 9e5a2f66f30e done
Copying config 40a435cf6e done
Writing manifest to image destination
Storing signatures
40a435cf6e5d7c1521048ace810bb088296749bc5bb061657cafb9f5eacbaae8

Now we inspect the container to look for the segfaulter binary

$ podman inspect quay.io/icdh/segfaulter@sha256:6afa4cc864ac2249d2fd981626a54f4a8fb6ca9b088eb17b8b5d540cb3f2296b | jq -r '.[].GraphDriver.Data.UpperDir'
/home/pacevedo/.local/share/containers/storage/overlay/5837ab9a5fb2185c354670ba9201829b9f8d8c570f3842be748d36dab71e9419/diff

We can find the segfaulter binary in that directory. Now we only need to start gdb with both the coredump and the original program.

$ gdb /home/pacevedo/.local/share/containers/storage/overlay/5837ab9a5fb2185c354670ba9201829b9f8d8c570f3842be748d36dab71e9419/diff/usr/local/bin/segfaulter 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.core
GNU gdb (GDB) Fedora 11.1-5.fc34
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/pacevedo/.local/share/containers/storage/overlay/5837ab9a5fb2185c354670ba9201829b9f8d8c570f3842be748d36dab71e9419/diff/usr/local/bin/segfaulter...
Illegal process-id: 26bab97c-032f-49a7-9617-e329c20fff20-dump-1650629160-segfaulter-segfaulter-1-4.core.

warning: Can't open file /usr/local/bin/segfaulter during file-backed mapping note processing
[New LWP 1]
Core was generated by ‘segfaulter'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x00007f59c4741e72 in segfaulter::main () at src/main.rs:10

Using rsync

In order to use rsync we need the core dump to be in the cluster. This means we need to use either intervals or schedules (and perform the action before those reach their deadlines and move the core away), or disabling S3 entirely.

To do so, we need to set any of those variables when installing the handler. For example, let’s disable INotify and S3 and get the core via rsync:

$ helm install core-dump-handler . --create-namespace --namespace observe --set daemonset.manageStoreSecret=false --set scc.create=true --set coreStorage=1Mi --set daemonset.useINotify=false
NAME: core-dump-handler
LAST DEPLOYED: Thu May 26 08:40:50 2022
NAMESPACE: observe
STATUS: deployed
REVISION: 1
NOTES:
Verifying the chart

Run a crashing container - this container writes a value to a null pointer

1. kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

2. Validate the core dump has been uploaded to your object store instance.

# Lets dump a core again
$ kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never
Logging a message 1 from segfaulter
...
pod default/segfaulter terminated (Error)

# Check where the segfaulter pod was running to use the daemon set pod in the same node.
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
segfaulter 0/1 Error 0 8m31s 10.128.2.16 ip-10-0-172-131.eu-west-1.compute.internal <none> <none>

# rsync is not present in the core-dump-agent image. For the purpose of this example we added it to the running container. This should be done when building the image.
$ oc exec -it -n observe core-dump-handler-5tb6s -- bash -c "microdnf install rsync -y"
(microdnf:15): librhsm-WARNING **: 10:06:45.506: Found 0 entitlement certificates
(microdnf:15): librhsm-WARNING **: 10:06:45.518: Found 0 entitlement certificates
Downloading metadata...
Downloading metadata...
Downloading metadata...
Package Repository Size
Installing:
rsync-3.1.3-12.el8.x86_64 ubi-8-baseos 414.5 kB
Transaction Summary:
Installing: 1 packages
Reinstalling: 0 packages
Upgrading: 0 packages
Obsoleting: 0 packages
Removing: 0 packages
Downgrading: 0 packages
Downloading packages...
Running transaction test...
Installing: rsync;3.1.3-12.el8;x86_64;ubi-8-baseos
Complete.

$ oc rsync -n observe core-dump-handler-5tb6s:/var/mnt/core-dump-handler/cores /tmp
receiving incremental file list
cores/
cores/5006e40b-67ba-4b8a-b936-8efa2020b0da-dump-1653379564-segfaulter-segfaulter-1-4.zip

sent 47 bytes received 29,946 bytes 19,995.33 bytes/sec
total size is 29,749 speedup is 0.99

Risks and mitigations

The handler uses persistent volume and persistent volume claims for storing core dumps. You may customize the storageClass (defaults to hostclass) to suit your storage provider, but keep in mind that once this PVC is full, core dumps won’t be written until there is enough space.

For example, imagine we have a 1Gi PVC and it is already full. If we try to dump again, this is what we get:

# ls -l
total 980388
-rw-r--r--. 1 root root 0 May 26 10:33 364800be-07d3-432e-acdd-bb9080ee1769-dump-1653561199-segfaulter-segfaulter-1-4.zip
-rw-r--r--. 1 root root 1003896832 May 26 10:18 heavy-file

We see an empty file where the core should be. This is obviously unreadable and also goes unnoticed. To fix it we can either:

  • Resize the PVC. Follow these instructions. Keep in mind that the original helm chart will need to be updated to align with the new value!
  • Redeploy with an updated size for the PVC.

Always remember to monitor your free space in your volumes!

We have seen the handler configure core_pattern with a pipe, meaning the settings in max size don’t apply. This should be handled with care, as big core dumps could hamper resource usage/contention in the system.

Cores are first passed as a stream of bytes to the handler, then written to the persistent volume, and then sent to the storage uploader. This means you need both enough free space in the PVC and enough bandwidth to send the core to the storage solution. The bigger the core dump, the longer the time it will take to write it to disk, read it and then send it to the storage solution. Creating high pressure on disk might have consequences on running applications, as they could start making use of buffering and start putting pressure on memory. We can not disregard the network either, as having a big amount of core dumps (or heavy) might also create spikes if using external storage.

Another important aspect is the core_pipe_limit setting. As we saw earlier, this parameter dictates how many cores you can be dumping simultaneously. When set to 0 it means unlimited, which might be problematic in big systems/applications. As we have seen in this section, dumping cores has a high cost in terms of disk, network and potentially in memory too. If we don’t limit the amount of core dumps we are able to dump at the same time, and we have a high number of crashing processes in the application (or even outside, remember the core dumps configuration is system wide!) we might be putting too much pressure on resources and making things worse. The handler comes with a preset value of 128 (non-customizable) for core_pipe_limit. You should check if this is enough for your application’s requirements.

To mitigate this we have a few options in the handler: schedule and interval.

Schedule is a cron formatted string that we can customize to our needs. Following the schedule we configure, the agent will sweep the cores directory to send them to the storage uploader.

Interval follows a similar pattern, but more restricted. Here you can specify the amount of milliseconds to wait between resyncs to the storage uploader.

These two configuration values limit the bandwidth usage when uploading core dumps. Keep in mind they are incompatible, you can only use one of them.

If using S3, we also need to keep an eye on the usage, if we don’t perform some rotation it will grow indefinitely. This is not provided by the handler, so remember to set it up if using this storage solution.