A HorizontalPodAutoscaler lets Kubernetes adjust the replica count of a scalable workload as CPU, memory, or another supported metric changes. It fits stateless Deployment and similar controller targets where adding Pods spreads work without changing the resources assigned to each Pod.
CPU and memory based autoscaling depends on the cluster resource metrics path. Metrics Server publishes the metrics.k8s.io API, and utilization targets need matching resource requests on the target containers so the autoscaler can compare current usage with requested capacity.
Choose replica bounds that fit node capacity, rollout surge, and availability expectations before applying an HPA to production. The target must expose the scale subresource, so a Deployment or StatefulSet is a normal target while a DaemonSet is not.
Related: How to install Kubernetes Metrics Server
Related: How to create a Kubernetes Deployment
Related: How to scale a Kubernetes Deployment
Steps to configure Kubernetes horizontal pod autoscaling:
- Check that the resource Metrics API is available.
$ kubectl get apiservice v1beta1.metrics.k8s.io NAME SERVICE AVAILABLE AGE v1beta1.metrics.k8s.io kube-system/metrics-server True 37s
Install or repair Metrics Server before creating a CPU or memory based HPA if AVAILABLE is not True.
Related: How to install Kubernetes Metrics Server - Check the target Deployment before attaching autoscaling.
$ kubectl get deployment worker --namespace team-a NAME READY UP-TO-DATE AVAILABLE AGE worker 1/1 1 1 6s
Use a context and namespace that point at the intended workload before changing replica automation.
Related: How to check Kubernetes cluster access - Confirm that the target container has a CPU request.
$ kubectl get deployment worker --namespace team-a -o jsonpath='{.spec.template.spec.containers[0].resources.requests.cpu}{"\n"}' 10mCPU utilization is calculated as a percentage of requested CPU. A missing CPU request leaves CPU utilization undefined for that Pod.
Tool: Kubernetes Resource Requests Checker - Create the HPA for the target Deployment.
$ kubectl autoscale deployment/worker --namespace team-a --cpu=50% --min=1 --max=4 horizontalpodautoscaler.autoscaling/worker autoscaled
--cpu sets the average CPU utilization target, here 50 percent. --min and --max set the replica range the HPA can write to the Deployment.
- Give the workload enough CPU pressure for the autoscaler to calculate a new replica count.
The HPA may show <unknown> for a short time after creation while the controller waits for metrics. A disposable CPU-bound workload can make scale-out visible in a lab namespace without a separate traffic generator.
- Check the HPA after metrics are available.
$ kubectl get hpa worker --namespace team-a NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE worker Deployment/worker cpu: 1010%/50% 1 4 4 46s
TARGETS shows current usage against the configured target. REPLICAS shows the replica count selected by the HPA, capped by MAXPODS.
- Verify that the target Deployment reached the HPA replica count.
$ kubectl get deployment worker --namespace team-a NAME READY UP-TO-DATE AVAILABLE AGE worker 4/4 4 4 72s
If the HPA reports a higher replica count but AVAILABLE stays lower, inspect Pods and recent Events before raising the maximum.
Related: How to check Kubernetes events
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.