How to configure Kubernetes horizontal pod autoscaling

A HorizontalPodAutoscaler lets Kubernetes adjust the replica count of a scalable workload as CPU, memory, or another supported metric changes. It fits stateless Deployment and similar controller targets where adding Pods spreads work without changing the resources assigned to each Pod.

CPU and memory based autoscaling depends on the cluster resource metrics path. Metrics Server publishes the metrics.k8s.io API, and utilization targets need matching resource requests on the target containers so the autoscaler can compare current usage with requested capacity.

Choose replica bounds that fit node capacity, rollout surge, and availability expectations before applying an HPA to production. The target must expose the scale subresource, so a Deployment or StatefulSet is a normal target while a DaemonSet is not.

Steps to configure Kubernetes horizontal pod autoscaling:

Check that the resource Metrics API is available.
```
$ kubectl get apiservice v1beta1.metrics.k8s.io
NAME                     SERVICE                      AVAILABLE   AGE
v1beta1.metrics.k8s.io   kube-system/metrics-server   True        37s
```
Install or repair Metrics Server before creating a CPU or memory based HPA if AVAILABLE is not True.
Related: How to install Kubernetes Metrics Server
Check the target Deployment before attaching autoscaling.
```
$ kubectl get deployment worker --namespace team-a
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
worker   1/1     1            1           6s
```
Use a context and namespace that point at the intended workload before changing replica automation.
Related: How to check Kubernetes cluster access
Confirm that the target container has a CPU request.
```
$ kubectl get deployment worker --namespace team-a -o jsonpath='{.spec.template.spec.containers[0].resources.requests.cpu}{"\n"}'
10m
```
CPU utilization is calculated as a percentage of requested CPU. A missing CPU request leaves CPU utilization undefined for that Pod.
Tool: Kubernetes Resource Requests Checker
Create the HPA for the target Deployment.
```
$ kubectl autoscale deployment/worker --namespace team-a --cpu=50% --min=1 --max=4
horizontalpodautoscaler.autoscaling/worker autoscaled
```
--cpu sets the average CPU utilization target, here 50 percent. --min and --max set the replica range the HPA can write to the Deployment.
Give the workload enough CPU pressure for the autoscaler to calculate a new replica count.

The HPA may show <unknown> for a short time after creation while the controller waits for metrics. A disposable CPU-bound workload can make scale-out visible in a lab namespace without a separate traffic generator.

Check the HPA after metrics are available.

$ kubectl get hpa worker --namespace team-a
NAME     REFERENCE           TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
worker   Deployment/worker   cpu: 1010%/50%   1         4         4          46s

TARGETS shows current usage against the configured target. REPLICAS shows the replica count selected by the HPA, capped by MAXPODS.

Verify that the target Deployment reached the HPA replica count.
```
$ kubectl get deployment worker --namespace team-a
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
worker   4/4     4            4           72s
```
If the HPA reports a higher replica count but AVAILABLE stays lower, inspect Pods and recent Events before raising the maximum.
Related: How to check Kubernetes events

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.