Background

Open Cluster Management (OCM) is a community-driven project focused on multicluster and multicloud scenarios for Kubernetes applications. In OCM, the multicluster scheduling capabilities are provided by Placement. As we have talked about in the previous article Using the Open Cluster Management Placement for Multicluster Scheduling, you can use Placement to filter clusters by label or claim selector. Placement also provides some default prioritizers which can be used to sort and select the most suitable clusters.

One of the default prioritizers is ResourceAllocatableCPU and ResourceAllocatableMemory. They provide the capability to sort clusters based on the allocatable CPU and memory. However, when considering the resource-based scheduling, the limitation is that "AllocatableCPU" and "AllocatableMemory" are static values and don't change, even if the cluster is running out of resources. And in some cases, the prioritizer needs more extra data to calculate the score of the managed cluster. For example, there is a requirement to schedule based on resource monitoring data from the cluster. For this reason, we need a more extensible way to support scheduling based on customized scores.

The following features introduced in this article are based on Open Cluster Management v0.7.0 and also delivered in Red Hat Advanced Cluster Management for Kubernetes 2.5.

What is Placement extensible scheduling?

OCM Placement introduces the AddOnPlacementScore API to support scheduling based on customized scores. This API can be used by Placement and can store the customized scores. For more details on the definitions of AddOnPlacementScore, see types_addonplacementscore.go. See the following AddOnPlacementScore example:

apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScore
metadata:
name: default
namespace: cluster1
status:
conditions:
- lastTransitionTime: "2021-10-28T08:31:39Z"
message: AddOnPlacementScore updated successfully
reason: AddOnPlacementScoreUpdated
status: "True"
type: AddOnPlacementScoreUpdated
validUntil: "2021-10-29T18:31:39Z"
scores:
- name: "cpuAvailable"
value: 66
- name: "memAvailable"
value: 55
  • conditions: Contains the different condition statuses for this AddOnPlacementScore.
  • validUntil: Defines the valid time of the scores. After this time, the scores are considered to be invalid by placement. Nil means no expiration. The controller owning this resource should keep the scores up-to-date.
  • scores: Contains a list of score names and values of this managed cluster. In the above example, the API contains a list of customized scores: cpuAvailable and memAvailable.

All the customized score information is stored in status, as we don't expect users to update it.

  • As a score provider, a third-party controller could run on either the hub or managed cluster to maintain the lifecycle of AddOnPlacementScore and update the score in status.
  • As a user, you need to know the resource name default and customized score name cpuAvailable and memAvailable to specify the name in the placement YAML to select clusters. For example, the followinng placement selects the top three clusters with the highest cpuAvailable score:
 apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement
namespace: ns1
spec:
numberOfClusters: 3
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
type: AddOn
addOn:
resourceName: default
scoreName: cpuAvailable
weight: 1
  • In Placement, if the user defines the scoreCoordinate type as AddOn, the Placement controller will get the AddOnPlacementScore resource with the name "default" in each cluster's namespace, read score "cpuAvailable" in the score list, and use that score to sort clusters.

You can refer to the enhancements to learn more about the design. In the design, lifecycle maintenance (create, update, and delete) of the AddOnPlacementScore custom resource is not covered, as we expect the customized score provider itself to manage it. In this article, we use an example to show you how to implement a third-party controller to update your own scores and extend the multiple clusters scheduling capability with your own scores.

How to implement a customized score provider

The example code is in the resource-usage-collect GitHub repository. It provides the score of the cluster's available CPU and memory, which can reflect the cluster’s real-time resource utilization. It is developed with OCM addon-framework and can be installed as an add-on plugin to update customized scores in AddOnPlacementScore. See Add-on Developer Guide to learn more about how to develop an addon.

The resource-usage-collect add-on follows the hub-agent architecture as below.

placement_extensiable_scheduling

The resource-usage-collect add-on contains a controller and an agent.

  • The resource-usage-collect-controller runs on the hub cluster. It is responsible for creating the ManifestWork for resource-usage-collect-agent in each cluster namespace.
  • On each managed cluster, the work agent watches the ManifestWork and installs the resource-usage-collect-agent on each cluster. The resource-usage-collect-agent is the core part of this addon; it creates AddonPlacementScore for each cluster on the Hub cluster and refreshes scores and validUntil every 60 seconds.

When the AddonPlacementScore is ready, you can specify the customized score in a Placement to select clusters.

The workflow and logic of the resource-usage-collect add-on are easy to understand. The following steps will help you get started:

Prepare an OCM environment with 2 ManagedClusters

  1. Run the setup dev environment by kind sript to prepare an environment by running the following command:
curl -sSL https://raw.githubusercontent.com/open-cluster-management-io/OCM/main/solutions/setup-dev-environment/local-up.sh | bash
  1. Run the following command to confirm that two ManagedCluster and a default ManagedClusterSet were created:
$ clusteradm get clusters
NAME ACCEPTED AVAILABLE CLUSTERSET CPU MEMORY KUBERENETES VERSION
cluster1 true True default 24 49265496Ki v1.23.4
cluster2 true True default 24 49265496Ki v1.23.4

$ clusteradm get clustersets
NAME BOUND NAMESPACES STATUS
default 2 ManagedClusters selected
  1. Run the following commands to bind the default ManagedClusterSet to the default namespace:
clusteradm clusterset bind default --namespace default
$ clusteradm get clustersets
NAME BOUND NAMESPACES STATUS
default default 2 ManagedClusters selected

Install the resource-usage-collect add-on

  1. Run the following command to git clone the source code:
git clone git@github.com:JiahaoWei-RH/resource-usage-collect.git 
cd resource-usage-collect
  1. Run the following command to prepare the image:
# get imagebuilder first
go get github.com/openshift/imagebuilder/cmd/imagebuilder@v1.2.1
export PATH=$PATH:$(go env GOPATH)/bin
# build image
make images
  1. Run the following command to deploy the resource-usage-collect add-on:
make deploy
  1. Run the following commands to verify the installation:

On the hub cluster, verify that the resource-usage-collect-controller pod is running.

$ kubectl get pods -n open-cluster-management | grep resource-usage-collect-controller
resource-usage-collect-controller-55c58bbc5-t45dh 1/1 Running 0 71s

On the hub cluster, verify that the AddonPlacementScore is generated for each managed cluster.

$ kubectl get addonplacementscore -A
NAMESPACE NAME AGE
cluster1 resource-usage-score 3m23s
cluster2 resource-usage-score 3m24s

The AddonPlacementScore status should contain a list of scores as follows:

$ kubectl get addonplacementscore -n cluster1 resource-usage-score -oyaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScore
metadata:
creationTimestamp: "2022-08-08T06:46:04Z"
generation: 1
name: resource-usage-score
namespace: cluster1
resourceVersion: "3907"
uid: 6c4280e4-38be-4d45-9c73-c18c84799781
status:
scores:
- name: cpuAvailable
value: 12
- name: memAvailable
value: 4

If AddonPlacementScore is not created or there are no scores in the status, go into the managed cluster and check if the resource-usage-collect-agent pod is running well by running the following command:

$ kubectl get pods -n default | grep resource-usage-collect-agent
resource-usage-collect-agent-5b85cbf848-g5kqm 1/1 Running 0 2m

Select clusters with the customized scores

If everything is running correctly, you can try to create a Placement and select clusters with the customized scores.

  1. Create a Placement to select one cluster with the highest cpuAvailable score.
cat << EOF | kubectl apply -f -
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement1
namespace: default
spec:
numberOfClusters: 1
clusterSets:
- default
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
type: AddOn
addOn:
resourceName: resource-usage-score
scoreName: cpuAvailable
weight: 1
EOF
  1. Verify the Placement decision.
$ kubectl describe placementdecision -n default | grep Status -A 3
Status:
Decisions:
Cluster Name: cluster1
Reason:

Cluster1 is selected by PlacementDecision.

Run the following command to get the customized score in AddonPlacementScore and the cluster score set by Placement. You can see that the cpuAvailable score is 12 in AddonPlacementScore. This value is also the cluster score in Placement events, which indicates that the Placement is using the customized score to select clusters.

$ kubectl get addonplacementscore -A -o=jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.status.scores}{"\n"}{end}'
cluster1 [{"name":"cpuAvailable","value":12},{"name":"memAvailable","value":4}]
cluster2 [{"name":"cpuAvailable","value":12},{"name":"memAvailable","value":4}]
$ kubectl describe placement -n default placement1 | grep Events -A 10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 50s placementController Decision placement1-decision-1 is created with placement placement1 in namespace default
Normal DecisionUpdate 50s placementController Decision placement1-decision-1 is updated with placement placement1 in namespace default
Normal ScoreUpdate 50s placementController cluster1:12 cluster2:12

Now you know how to install the resource-usage-collect add-on and consume the customized score to select clusters. Next, let's take a deeper look at some key points when you consider implementing a customized score provider.

Where to run the customized score provider

The customized score provider could run on either the hub or managed cluster. Combined with user stories, you should be able to tell whether the controller should be placed in a hub or a managed cluster.

In our example, the customized score provider is developed with addon-famework, which follows the hub-agent architecture. The resource-usage-collect-agent is the real score provider. It is installed on each managed cluster and retrieves the available CPU and memory of the managed cluster, calculates a score, and updates it in AddonPlacementScore. The resource-usage-collect-controller just takes care of installing the agent.

In other cases, for example, if you want to use the metrics from Thanos to calculate a score for each cluster, then the customized score provider only needs to be placed on the hub, as Thanos has all the metrics collected from each managed cluster.

How to maintain the AddOnPlacementScore CR lifecycle

In our example, the code to maintain the AddOnPlacementScore CR is in pkg/addon/agent/agent.go.

  • When should the score be created?

    The AddOnPlacementScore CR can be created with the existence of a ManagedCluster or on demand for the purpose of reducing objects on the hub.

    In our example, the add-on creates a AddOnPlacementScore for each managed cluster if it does not exist and a score is calculated when creating the CR for the first time.

  • When should the score be updated?

    We recommend that you set ValidUntil when updating the score so that the Placement controller can know if the score is still valid in case it failed to update for a long time.

    The score could be updated when your monitoring data changes, or when you need to update it before it expires.

    In our example, in addition to recalculating and updating the score every 60 seconds, the update will also be triggered when the node or pod resource in the managed cluster changes.

How to calculate the score

The code to calculate the score is in pkg/addon/agent/calculate.go. A valid score must be in the range between -100 and 100. You need to normalize the scores before updating them in AddOnPlacementScore.

When normalizing the score, you might run into the following issues:

  • The score provider knows the max and min value of the customized scores.

    In this case, it is easy to achieve smooth mapping by using a formula. If the actual value is X, and X is in the interval [min, max], then score = 200 * (x - min) / (max - min) - 100

  • The score provider doesn't know the max and min value of the customized scores.

    In this case, you need to set a max and min value by yourself, as without a max and min value, it is not possible to map a single value X to the range [-100, 100].

When X is greater than this max value, the cluster can be considered healthy enough to deploy applications, and the score can be set as 100. And if X is less than the min value, the score can be set as -100.

if X >= max
score = 100
if X <= min
score = -100

In our example, the resource-usage-collect-agent running on each managed cluster doesn't have a holistic view to know the max/min value of the CPU/memory usage of all the clusters, so we manually set the max value as MAXCPUCOUNT and MAXMEMCOUNT in the code, and the min value is set as 0. The score calculation formula can be simplified as follows: score = x / max * 100

Summary

In this article, we introduced what Placement extensible scheduling is and used an example to show how to implement a customized score provider. This article also listed three key points the developer needs to consider when implementing a third-party score provider. After reading this article, you should have a clear view of how Placement extensible scheduling can help you extend the multicluster scheduling capabilities.

All the features introduced in this article are based on Open Cluster Management v0.7.0 and also delivered in Red Hat Advanced Cluster Management for Kubernetes 2.5. The latest features will keep updating in Extend the multicluster scheduling capabilities with placement.

Feel free to ask questions in the Open-cluster-management-io GitHub community or contact us by using Slack.