Monitoring a Kubernetes cluster in Checkmk connects the cluster API, the Checkmk Kubernetes collectors, and Checkmk service discovery so nodes, pods, workloads, and usage data appear in monitoring. The connection matters when Kubernetes objects change often enough that manual host creation cannot keep pace with the cluster.
Checkmk reads basic cluster state through the Kubernetes special agent, while the Checkmk Node Collector and Cluster Collector provide usage data such as CPU, memory, and filesystem metrics. The collectors are installed in the cluster with the official Helm chart, and Checkmk queries the API server plus the Cluster Collector endpoint from a dedicated cluster host.
A NodePort Cluster Collector endpoint is shown because it is easy to verify from the shell. Use an Ingress endpoint instead when that is the approved exposure path for the cluster, and keep the service account token and CA certificate out of screenshots, shell history, and saved troubleshooting notes.
$ helm repo add checkmk-chart https://checkmk.github.io/checkmk_kube_agent "checkmk-chart" has been added to your repositories
$ helm show chart checkmk-chart/checkmk apiVersion: v2 appVersion: 1.11.0 description: Helm chart for Checkmk - Your complete IT monitoring solution icon: https://checkmk.com/application/files/thumbnails/low_res/9515/9834/3872/checkmk_icon_main.png kubeVersion: '>=1.19.0-0' name: checkmk type: application version: 1.11.0
The current chart declares the Kubernetes version range it supports. Stop here if the cluster is older than the chart's kubeVersion value.
clusterCollector: service: type: NodePort nodePort: 30035
Use the chart's clusterCollector.ingress settings instead when the cluster exposes services through Ingress. Keep the same endpoint choice through the Checkmk rule so the monitoring server queries the reachable address.
$ helm upgrade --install --create-namespace --namespace checkmk-monitoring myrelease checkmk-chart/checkmk -f values.yaml Release "myrelease" has been upgraded. Happy Helming! NAME: myrelease NAMESPACE: checkmk-monitoring STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: You can access the checkmk cluster-collector via: NodePort: http://10.0.12.40:30035
The collectors run with permissions that let Checkmk read cluster, node, pod, and workload state. Install them in a dedicated namespace and review the chart values before applying them to a production cluster.
$ helm status --namespace checkmk-monitoring myrelease NAME: myrelease NAMESPACE: checkmk-monitoring STATUS: deployed REVISION: 1
$ kubectl get service --namespace checkmk-monitoring myrelease-checkmk-cluster-collector NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE myrelease-checkmk-cluster-collector NodePort 10.96.178.75 <none> 8080:30035/TCP 2m
For NodePort access, combine a reachable node address with the node port. For Ingress access, use the hostname or URL shown by the ingress controller.
$ TOKEN=$(kubectl get secret myrelease-checkmk-checkmk --namespace checkmk-monitoring --output jsonpath='{.data.token}' | base64 --decode)
Do not print real tokens into shared logs or screenshots. Copy the token directly into the Checkmk password store in the next Checkmk-side step.
$ kubectl get secret myrelease-checkmk-checkmk --namespace checkmk-monitoring --output jsonpath='{.data.ca\.crt}' | base64 --decode
-----BEGIN CERTIFICATE-----
MIIBdjCCAR2gAwIBAgIBADAKBggqhkjOPQQDAjAjMSEwHwYDVQQDDBhrM3Mtc2Vy
##### snipped #####
-----END CERTIFICATE-----
Copy the full certificate, including the BEGIN CERTIFICATE and END CERTIFICATE lines.
$ curl --header "Authorization: Bearer $TOKEN" http://10.0.12.40:30035/metadata
{
"cluster_collector_metadata": {
"host_name": "myrelease-checkmk-cluster-collector-7d8c6f8b5d-lxq2m",
"checkmk_kube_agent": {
"project_version": "1.11.0"
}
}
}
Replace the URL with the NodePort or Ingress endpoint that the Checkmk server can reach.
Use a title such as Kubernetes production token so the later Kubernetes rule can select the entry without exposing the token value.
The Kubernetes rule can then use certificate verification instead of disabling TLS checks for the API server.
The cluster host receives the special-agent and piggyback data at cluster level; it is not a host that Checkmk should ping directly.
Related: How to create a Checkmk piggyback host
In commercial editions, create a connection under Setup → Hosts → Dynamic host management and restrict the source host to the cluster host. In Checkmk Community, use the piggyback orphan list and create the Kubernetes object hosts manually.
Set the cluster name, select the stored token, enter the Kubernetes API server endpoint, enable certificate verification, enable Enrich with usage data from Checkmk Cluster Collector, and enter the Cluster Collector endpoint.
Related: How to create a Checkmk rule for selected hosts
Set Conditions → Explicit hosts to the cluster host. A broader condition can run the Kubernetes special agent for unrelated hosts and create confusing discovery results.
Accept the Kubernetes API and Cluster Collector services when discovery finds them.
Related: How to run Checkmk service discovery
Activation sends the saved host, rule, password-store, certificate, and discovery changes to monitoring.
Related: How to activate Checkmk pending changes
The cluster host should show Kubernetes API with Live, Ready in the summary, and Cluster Collector should show the collector version. In commercial editions, Monitor → Applications → Kubernetes should show CPU and memory resource data, and the Kubernetes Cluster dashboard should show Primary datasource, Cluster collector, and API health as OK.