Increasing Speed and Data Volume in the Must-gather Process
August 30, 2021 | by
Must-gather is a tool for gathering logs and other relevant information from OpenShift Clusters, which is helpful for debugging. The must-gather itself is generic for all projects running on OpenShift, but each project could have one or more must-gather images to gather information relevant for a given purpose (like logs, CRs, metrics, and others ).
In Migration toolking for virtualization (MTV), we use must-gather to get debugging information to find the root cause of migration issues. The most relevant information gathered by must-gather are Custom Resources (CRs) which contain migration data and related inventory, like VM or data volumes and controller logs.
Gathered information is very valuable for identifying a root cause of possible failures or providing background information for customer’s tickets or upstream issues. However, the amount of data captured by must-gather could become huge. That means that the time needed for must-gather execution is pretty long, and large archives are produced by the must-gather tool. Archives with size in gigabytes are not great to be uploaded as a customer ticket attachment, and searching for relevant data within a large archive gets complicated and time-consuming.
The Crane/MTC team, in Konveyor, introduced parallelization of must-gather tasks. Commands on the image were executed in the background, and the main script waited until all gathering had finished. This decreased the time needed to execute the must-gather, but the result archive remains still as big as it was before.
To avoid gathering a bunch of less relevant data and keep archives in smaller size and faster must-gather executions, we implemented a “targeted gathering.”
The oc adm must-gather command accepts a custom command as an argument, so we used it to pass parameters to the must-gather execution. Parameters are environment variables that are specific to our VM migrations use cases. The forklift-must-gather image contains script /usr/bin/targeted which handles these parameters and filters objects captured into the result must-gather archive.
Objects are filtered according to their type. The targeted gathering script knows the relation between CRs and adds only relevant ones into the archive. Logs are filtered by a grep command, and the search query is constructed from CRs types, names, and IDs, resulting in a smaller file generated with more precise and targeted data.
An example of gathering Forklift debugging information for VM vm-3345 from the namespace ns-test-1 is below. More examples can be found in Forklift must-gather project README.
Narrowing results in must-gather archives helps to keep the result archives smaller, faster generated, but still providing the most relevant information. That helps getting debugging information from customers or upstream community members. This way of the must-gather parametrization could be helpful to other projects that use OpenShift must-gather with their own gathering images.
Red Hat OpenShift, the most popular container orchestration platform, has always been about flexibility, scalability, and resilience. As workloads evolve, so do the requirements for resources such as ...