A HorizontalPodAutoscaler lets Kubernetes adjust the replica count of a scalable workload as CPU, memory, or another supported metric changes. It fits stateless Deployment and similar controller targets where adding Pods spreads work without changing the resources assigned to each Pod.
CPU and memory based autoscaling depends on the cluster resource metrics path. Metrics Server publishes the metrics.k8s.io API, and utilization targets need matching resource requests on the target containers so the autoscaler can compare current usage with requested capacity.
Choose replica bounds that fit node capacity, rollout surge, and availability expectations before applying an HPA to production. The target must expose the scale subresource, so a Deployment or StatefulSet is a normal target while a DaemonSet is not.
Related: How to install Kubernetes Metrics Server
Related: How to create a Kubernetes Deployment
Related: How to scale a Kubernetes Deployment
$ kubectl get apiservice v1beta1.metrics.k8s.io NAME SERVICE AVAILABLE AGE v1beta1.metrics.k8s.io kube-system/metrics-server True 37s
Install or repair Metrics Server before creating a CPU or memory based HPA if AVAILABLE is not True.
Related: How to install Kubernetes Metrics Server
$ kubectl get deployment worker --namespace team-a NAME READY UP-TO-DATE AVAILABLE AGE worker 1/1 1 1 6s
Use a context and namespace that point at the intended workload before changing replica automation.
Related: How to check Kubernetes cluster access
$ kubectl get deployment worker --namespace team-a -o jsonpath='{.spec.template.spec.containers[0].resources.requests.cpu}{"\n"}'
10m
CPU utilization is calculated as a percentage of requested CPU. A missing CPU request leaves CPU utilization undefined for that Pod.
Tool: Kubernetes Resource Requests Checker
$ kubectl autoscale deployment/worker --namespace team-a --cpu=50% --min=1 --max=4 horizontalpodautoscaler.autoscaling/worker autoscaled
--cpu sets the average CPU utilization target, here 50 percent. --min and --max set the replica range the HPA can write to the Deployment.
The HPA may show <unknown> for a short time after creation while the controller waits for metrics. A disposable CPU-bound workload can make scale-out visible in a lab namespace without a separate traffic generator.
$ kubectl get hpa worker --namespace team-a NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE worker Deployment/worker cpu: 1010%/50% 1 4 4 46s
TARGETS shows current usage against the configured target. REPLICAS shows the replica count selected by the HPA, capped by MAXPODS.
$ kubectl get deployment worker --namespace team-a NAME READY UP-TO-DATE AVAILABLE AGE worker 4/4 4 4 72s
If the HPA reports a higher replica count but AVAILABLE stays lower, inspect Pods and recent Events before raising the maximum.
Related: How to check Kubernetes events