Introduction

Many organizations use resource quotas on OpenShift namespaces to control the amount of central processing unit and memory consumed. Without such constraints, a single developer could inadvertently consume huge amounts of resources, causing constraints on other teams. OpenShift platforms can automatically scale to add additional nodes when the current capacity is not sufficient to cope with the workload demands placed upon it. However, this may not always be necessary, and expensive resources may be wasted. To manage a fair and equitable allocation of resources, quotas can be used to constrain the development teams within reason and to carefully consider the resources requested by microservices.

OpenShift Pipelines is a Continuous Integration/Continuous Delivery (CI/CD) solution based on the open source Tekton project. The key objective of Tekton is to enable development teams to quickly create pipelines of activity from simple, repeatable steps. A unique characteristic of Tekton that differentiates it from previous CI/CD solutions is that Tekton steps execute within a container that is specifically created just for that specific step. This provides a degree of isolation that fosters predictable and repeatable task execution and ensures that development teams do not have to manage a shared build server instance. Additionally, Tekton components are Kubernetes resources which divest the management, scheduling, monitoring, and removal of Tekton components to the platform. Pods created to run Tekton tasks consume resources in exactly the same manner as any other workload. However, a problem exists with the allocation of compute and memory resources for Tekton tasks, and the resolution of that problem is the purpose of this article.

Resource allocation and request

Resource quota

Resource quotas provide a limit on the total amount of resources that can be consumed by all currently active pods within the namespace.

Limit range

A limit range provides a limit on the amount of resources that a single container can consume within the namespace. Clearly the limit range should be less than a resource quota limit.

Limits and requests

Resource constraints are specified in terms of limits and requests. A request defines the minimum amount of resources required by the container. The limit defines the maximum amount of resources that the container is able to consume.

Example deployment

An example of a deployment that has resource limits and requests is shown below:

      - name: liberty-rest
      image: quay.io/marrober/layers:latest
      imagePullPolicy: Always
      ports:
      - containerPort: 9080
        name: http
        protocol: TCP
      resources:
        requests:
          memory: "64Mi"
          cpu: "250m"
        limits:
          memory: "128Mi"
          cpu: "500m"

The above example will consume between 64 and 128 Mebibytes of RAM and it will use between 250 and 500 mili cores of CPU.

Note that 1 Mebibyte = 1 *2^10 bytes, so to be accurate, the memory used will be between 67,108,864 bytes and 134,217,728 bytes.

For CPU, the indication of 250 to 500 mili cores means that the container will use between ¼ and ½ of a virtual CPU or core on a worker node.

Tekton tasks and containers

When a Tekton pipeline is executed, a pod is created for each task defined within the pipeline. A container image is created and executed within the pod for each step within the task. The relationship between these entities is shown in figure 1 below:

Figure 1: Task/step to pod/container relationship

In addition to the above containers within the pod, at least 1 init container will also be created. It is possible to define the resource allocations to the containers associated with the steps using the process shown below, but it is not possible to allocate a specific resource to init containers without the solution shown in this article.

Defining resource requirements in steps

When resource quotas are defined on a project, it is necessary for each container that is created to have a specification section that defines the resource limits and resource requests. For a Tekton task, this is simple to do as shown in the example task below:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: task-1
spec:
steps:
 - name: step-1
   image: registry.access.redhat.com/ubi8/ubi:8.3-297     
   resources:
     limits:
       cpu: 200m
       memory: 200Mi
     requests:
       cpu: 100m
       memory: 100Mi  
   script: >
     #!/usr/bin/env sh
     ls -al /

The above shows that the step will consume between 0.1 and 0.2 of a virtual CPU or core on a worker node and between 104,857,600 and 209,715,200 bytes of memory.

The example task that has been used for this post actually contains three steps that each have a specific allocation of resources as indicated below:

Step 1

  resources:

    limits:

      cpu: 200m

      memory: 200Mi

    requests:

      cpu: 200m

      memory: 100Mi

Step 2

  resources:

    limits:

      cpu: 200m

      memory: 1000Mi

    requests:

      cpu: 200m

      memory: 512Mi

Step 3

  resources:

    limits:

      cpu: 200m

      memory: 200Mi

    requests:

      cpu: 100m

      memory: 100Mi

Even though the steps are sequenced to run one after the other in the pipeline process, the containers that deliver each of the steps all start at the same time. So the resource quota allocation must be more than the total of all of the individual values shown in the table above, which is:

  resources:
  limits:
    cpu: 600m
    memory: 1400Mi
  requests:
    cpu: 500m
    memory: 712Mi 

The container images indicated in figure 1 are not the only container images created when a task is executed. Additional containers called init containers are also created, and these are at the center of the issue.

The problem demonstrated

The source assets for this scenario can be found here. Clone or download the yaml content in this directory if you wish to observe the behavior described for yourself, or alternatively, read the detailed explanation below.

The namespace to be used for testing is ‘limit-testing’; however, you can edit the yaml definitions to change the namespace to anything that is suitable.

The OpenShift pipelines operator needs to be installed on a cluster to perform the tasks shown below:

Create the namespace

Use the oc command to create the namespace:

oc new-project limit-testing

Create the Tekton resources

oc create -f task-1.yaml
oc create -f test-pipeline.yaml

Test pipeline execution

Perform a test execution of the pipeline while there are no resource quota or limit range constraints in place:

oc create -f test-pipeline-run.yaml

Locate the running pipeline and wait for it to show a status of finished:

tkn pipelinerun list

NAME            STARTED          DURATION   STATUS
test-pr-qcjsv   10 seconds ago   ---       Running


Wait a minute or so and then repeat the above command to show  the completed status:

tkn pipelinerun list   
           
NAME            STARTED          DURATION     STATUS
test-pr-qcjsv   32 seconds ago   27 seconds   Succeeded

Display the log for the pipeline execution, which will show directory listings (which have been reduced for readability) and the execution of uname. If you are performing these tests on your OpenShift platform, then you need to change the pipeline run names to use what is reported on your environment:

tkn pipelinerun logs test-pr-qcjsv
[task-1 : step-1] total 0
[task-1 : step-1] dr-xr-xr-x.   1 root root        87 Mar 24 09:09 .
[task-1 : step-1] dr-xr-xr-x.   1 root root        87 Mar 24 09:09 ..
[task-1 : step-1] lrwxrwxrwx.   1 root root         7 Apr 23  2020 bin -> usr/bin
[task-1 : step-1] dr-xr-xr-x.   2 root root         6 Apr 23  2020 boot
[task-1 : step-2] Linux test-pr-qcjsv-task-1-gdwj4-pod-shxf5 4.18.0-305.34.2.el8_4.x86_64 #1 SMP Mon Jan 17 09:42:23 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
[task-1 : step-3] total 0
[task-1 : step-3] dr-xr-xr-x.   1 root root        87 Mar 24 09:09 .
[task-1 : step-3] dr-xr-xr-x.   1 root root        87 Mar 24 09:09 ..
[task-1 : step-3] lrwxrwxrwx.   1 root root         7 Apr 23  2020 bin -> usr/bin

Clearly the pipeline has been executed successfully.

Introduce the resource quota

Create the resource quota using the command:

oc create -f resource-quota.yaml

Test pipeline execution

Perform a test execution of the pipeline now that the resource quota constraint is in place:

oc create -f test-pipeline-run.yaml

Examine the running pipeline as before and observe that the pipeline execution has failed:

tkn pipelinerun list 

NAME            STARTED          DURATION     STATUS
test-pr-rkjpp   12 minutes ago   0 seconds   Failed

The pipeline run has failed to execute successfully.

Examining the problem

When the pipelinerun log is examined, as shown below, it shows that the pod failed to be created because a container did not have valid resource limits. This can be confusing as the step definitions for the task clearly do have resource constraints:

tkn pipelinerun logs test-pr-rkjpp

task task-1 has failed: failed to create task run pod "test-pr-rkjpp-task-1-8q6f7": pods "test-pr-rkjpp-task-1-8q6f7-pod-n9kzf" is forbidden: failed quota: example: must specify limits.cpu,limits.memory,requests.cpu,requests.memory. Maybe missing or invalid Task limit-testing/task-1
pod for taskrun test-pr-rkjpp-task-1-8q6f7 not available yet

Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 0

 

Examine the pod for the successful task

Further information can be found by examining the pod for the successful execution of the pipeline. The command below will show the pod within the project for the pipeline execution:

oc get pod

NAME                                   READY   STATUS      RESTARTS   AGE
test-pr-qcjsv-task-1-gdwj4-pod-shxf5   0/3     Completed   0          21m

Examining the containers within the pod in more detail is required.
The command below identifies how many containers are in the pod:

oc get pod/test-pr-qcjsv-task-1-gdwj4-pod-shxf5 \
-o jsonpath='{.spec.containers[*].name}' | wc -w
     3

The response of three matches the number of steps in the task.

The command below identifies how many init containers are in the pod:

oc get pod/test-pr-qcjsv-task-1-gdwj4-pod-shxf5 \
-o jsonpath='{.spec.initContainers[*].name}' | wc -w
     2

The init containers are outside the direct control of the author of the task and pipeline. Examining the resource constraints of each container indicates that the step containers match the resource allocations in the task definition:

oc get pod/test-pr-qcjsv-task-1-gdwj4-pod-shxf5 \
-o jsonpath='{range .spec.containers[*]}{.name}{"\n"}{.resources}{"\n"}'
step-step-1
{"limits":{"cpu":"200m","memory":"200Mi"},"requests":{"cpu":"200m","memory":"100Mi"}}
step-step-2
{"limits":{"cpu":"200m","memory":"1000Mi"},"requests":{"cpu":"200m","memory":"512Mi"}}
step-step-3
{"limits":{"cpu":"200m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}

When the same command is repeated for the init containers, no resource requests or limits are seen:

oc get pod/test-pr-qcjsv-task-1-gdwj4-pod-shxf5 \
-o jsonpath='{range .spec.initContainers[*]}{.name}{"\n"}{.resources}{"\n"}'
place-tools
{}
place-scripts
{}

The lack of limits on the init container causes the issue when the resource quota is in place.

The solution

A process to overcome the problem is to apply a limit range resource to the project alongside the resource quota. A limit range will apply constraints to individual containers while the resource quota applies limits cumulatively to all containers running on the project.

Apply the limit range

Apply the limit range using the command below:

oc apply -f limit-range.yaml

The limit range specification is shown below:

kind: LimitRange
apiVersion: v1
metadata:
name: limit-range
namespace: limit-testing
spec:
limits:
 - type: Container
   default:
     cpu: 500m
     memory: 512Mi
   defaultRequest:
     cpu: 500m
     memory: 256Mi

Test pipeline execution

Perform a test execution of the pipeline when there are both resource quota and limit range constraints in place:

oc create -f test-pipeline-run.yaml

Examine the running pipeline as before and observe that the pipeline execution has successfully completed.

tkn pipelinerun list 

NAME            STARTED          DURATION     STATUS
test-pr-tn22w   27 minutes ago   6 seconds   Succeeded

Examine the limits applied to the containers in the pipeline task pod. The step containers have the resource constraints applied correctly as before:

oc get pod/test-pr-tn22w-task-1-j8gqt-pod-mbxn5 \
-o jsonpath='{range .spec.containers[*]}{.name}{"\n"}{.resources}{"\n"}'

step-step-1
{"limits":{"cpu":"200m","memory":"200Mi"},"requests":{"cpu":"200m","memory":"100Mi"}}

step-step-2
{"limits":{"cpu":"200m","memory":"1000Mi"},"requests":{"cpu":"200m","memory":"512Mi"}}

step-step-3
{"limits":{"cpu":"200m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}

The init containers now have resource limits applied to them that have been taken from the limit range. The limits are taken from the default values within the limit range and the requests are a fraction of those limits:

oc get pod/test-pr-tn22w-task-1-j8gqt-pod-mbxn5 \
-o jsonpath='{range .spec.initContainers[*]}{.name}{"\n"}{.resources}{"\n"}'
place-tools
{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"125m","ephemeral-storage":"0","memory":"64Mi"}}
place-scripts
{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"125m","ephemeral-storage":"0","memory":"64Mi"}}

Summary

If you want to use Tekton pipelines in a project that has resource quotas applied, then it is necessary to also create a limit range such that the init containers will have limits placed upon them as well.