Today Kubernetes 1.8 was released and plenty of exciting features have been introduced or graduated to general availability—from RBAC to CRI-O, the lightweight container runtime for Kubernetes, to extensibility of kubectl via plugins. In this post, I’m going to cover a feature that I’ve been working on and that shipped with Kubernetes 1.8.

Ever since horizontal pod autoscaling (HPA) on CPU usage was stabilized in Kubernetes 1.2, one of the most commonly requested features has been support for scaling on other metrics, particularly application metrics.

Alpha support for autoscaling on arbitrary metrics, including application metrics, debuted as alpha in Kubernetes 1.6. In Kubernetes 1.8, we've graduated it to beta, so the API should be enabled by default for you to try out (see note below).

Let's take a look at different parts of an example application, and how to use the new autoscaling features to scale them. Our application will consist of a web frontend, an in-memory database, and workers that process background jobs.

Application 1: The Frontend

Let's start with our nice web frontend. Based on performance testing, we have a pretty good handle on how many connections each application pod can handle before having performance issues, and we'd like to write an autoscaler to scale on that.

Let's take a look at our old CPU-based HPA that we've been relying on to handle scaling thus far:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
scaleTargetRef:
kind: Deployment
name: frobinator-frontend
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80

Let's fetch that as autoscaling/v2beta1 so we can add in our new connections metric:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
scaleTargetRef:
kind: Deployment
name: frobinator-frontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80

Our HPA starts out the same: It still has a scaleTargetRef pointing to our application, as well as a minimum and maximum. However, instead of a single targetCPUUtilizationPercentage, we now have a list of metric specs containing on resource metric spec. A resource metric is any metric which corresponds to the resources that you can set in requests and limits. Just like before, we can specify our target as a percentage of the requested value, this time using the targetAverageUtilization.

Now, let's add in concurrent connections as a metric to scale on:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
scaleTargetRef:
kind: Deployment
name: frobinator-frontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Pods
pods:
metricName: current_connections
targetAverageValue: 10

Because metrics is a list, we can specify multiple metrics to scale on. In this case, we provide targets for both CPU and connections, and the HPA controller will use whichever one gives the largest replica count. We've also used a new type of metric spec: pod metrics.

Pod metrics are any non-resource metric that describes each pod in our target application. Just like with resource metrics, we take the value for each running pod, and average them together to figure out our scaling ratio. With pod metrics, we compare against a raw value -- we have no request value to compare against, so we can't specify it as a percentage.

Application 2: The In-Memory Database

Now onto our in-memory database which backs our frontend. Our in-memory database is designed to scale around memory usage. While the autoscaling/v1 version of the HorizontalPodAutoscaler (HPA) was limited to scaling on the CPU resource metric, the autoscaling/v2beta1 version can scale on any resource provided by the new resource metrics API. For the moment, this includes CPU and memory.

Let's write an HPA to scale our database:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: mem-db-scaler
spec:
scaleTargetRef:
kind: Deployment
name: mem-db
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: memory
targetAverageValue: 1G

Notice that we've specified memory in terms of absolute value: whereas before we could only scale in terms of percentage-of-request (a.k.a. utilization), we can now specify an absolute value instead, using the familiar "quantity" notation from requests and limits.

Application 3: The Workers

Finally, let's autoscale our job processing workers. Each worker can process one job at a time. We want to make certain that we keep the rate of jobs coming into our backlog roughly equal to the rate of jobs being removed from the backlog for processing. To do this, we'll assume we have a metric called processing_ratio. Either our application exports it directly, or we have something like a Prometheus recording rule which calculates it based on other metrics about the backlog.

Let's write an HPA which scales based on processing_ratio:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: job-processors-scaler
spec:
scaleTargetRef:
kind: Deployment
name: job-processors
minReplicas: 2
maxReplicas: 5
metrics:
- type: Object
resource:
target:
kind: Deployment
name: work-queue-manager
metricName: processing_ratio
targetAverageValue: 1

Here, we use a third kind of metric: object metrics. Unlike the other two metric types, this type does not (necessarily) refer to a metric describing the pods in our target application. Instead, it refers to the
processing_ratio metric, which describes out work-queue-manager application. The HPA controller uses this value directly to compute the scale ratio, instead of averaging it, like with the pods and resource metric types.

Is it Working?

Now that we have all of our HorizontalPodAutoscalers set up, we should check if they're working. We can use the autoscaling/v2beta1 API to check to see if anything is wrong without our HPAs, through the status.conditions field. This field works similarly to the conditions field in other Kubernetes objects such as pods or nodes.

Let's take a look at the status of our frontend HPA:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
...
status:
observedGeneration: 10
lastScaleTime: <some recent time>
currentMetrics:
- type: Resource
resource:
name: cpu
currentAverageUtilization: 80
currentAverageValue: 400m
conditions:
- type: AbleToScale
status: True
reason: SucceedGetScale
lastTransitionTime: <just now>
message: "the HPA controller was able to get the target's current scale"
- type: ScalingActive
status: ConditionFalse
reason: FailedGetPodsMetric
message: "the HPA was unable to compute the replica count: <some error>"
lastTransitionTime: <just now>

Whoops! Looks like we forgot to properly set up our metrics pipeline, because the conditions indicate that we were able to fetch the current scale of our frontend (AbleToScale), but could not fetch the metric (ScalingActive).

Let's fix that, and look again:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
...
status:
observedGeneration: 10
lastScaleTime: <some recent time>
currentReplicas: 10
desiredReplicas: 10
currentMetrics:
- type: Resource
resource:
name: cpu
currentAverageUtilization: 80
currentAverageValue: 400m
conditions:
- type: AbleToScale
status: True
reason: SucceedRescale
lastTransitionTime: <just now>
message: "the HPA controller was able to get the target scale to 10"
- type: ScalingActive
status: ConditionTrue
reason: ValidMetricFound
message: "the HPA was able to succesfully calculate a replica count from current_connections"
lastTransitionTime: <just now>
- type: ScalingLimited
status: True
reason: TooManyReplicas
message: "the desired replica count is greater than the maximum replica count."
lastTransitionTime: <just now>

Now that we've fixed our metrics pipeline, the we can see that the HPA is working properly, but would scale up more if possible. Let's fix that too by increasing our maxReplicas value and check one more time:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
...
status:
observedGeneration: 10
lastScaleTime: <some recent time>
currentReplicas: 10
desiredReplicas: 10
currentMetrics:
- type: Resource
resource:
name: cpu
currentAverageUtilization: 80
currentAverageValue: 400m
conditions:
- type: AbleToScale
status: False
reason: BackoffBoth
lastTransitionTime: <just now>
message: "the time since the previous scale is still withing both the downscale and upscale forbidden windows"
- type: ScalingActive
status: ConditionTrue
reason: ValidMetricFound
message: "the HPA was able to succesfully calculate a replica count from current_connections"
lastTransitionTime: <just now>
- type: ScalingLimited
status: False
reason: DesiredWithinRange
message: "the desired replica count is within the acceptible range"
lastTransitionTime: <just now>

Since we just scaled, the HPA tells us that it won't scale again until it's outside the HPA controller's forbidden window, but that the desired replica count is now less than our maximum replica count, so we're all set.

What's in Store

Now that version 2 of the HPA API has been moved to beta, we're hoping to graduate the API to stable in Kubernetes 1.9. In the meantime, we'd love to hear feedback from users. Feel free to reach out to the SIG Autoscaling via our mailing list, or join our meetings.

Note: Enabling Custom Metrics

While the autoscaling/v2beta1 API is enabled by default in Kubernetes 1.8, a few extra steps are needed to set up support for scaling on custom metrics.

First, you'll need a metrics pipeline that provides the custom metrics API (custom.metrics.k8s.io/v1beta1). See the k8s.io/metrics repo for a list of known implementations, which should each have their own setup instructions.

Once you've got a provider of the API, you'll need to make sure that your Kubernetes controller manager is set up to consume it. For Kubernetes 1.8, you'll need to pass --horizontal-pod-autoscaler-use-rest-clients=true. In Kubernetes 1.9, this will be on by default.

Check out the post on garbage collection for more info on Kubernetes 1.8.

Happy scaling!


Categories

News, How-tos

< Back to the blog