How to back up and restore the LGTM stack

Backing up and restoring the LGTM stack protects the configuration and data paths that make Grafana, Loki, Tempo, and Mimir usable after a cluster failure or migration. A recovery plan should cover Helm values, secret references, Grafana dashboards and data sources, and backend object storage.

Object storage usually holds the durable log, trace, and metric blocks, while Kubernetes holds release state, secret references, and service configuration. Grafana may also use a database or persistent volume that needs a separate backup path.

Do not call a backup complete until a clean namespace or staging cluster can restore the stack and query retained telemetry. A restore drill should prove that Grafana comes back with the expected data sources and that each backend can return at least one known signal.

Steps to back up and restore the LGTM stack:

  1. Create a backup workspace outside the repository root.
    $ mkdir -p ~/lgtm-backup/values ~/lgtm-backup/grafana
    $ chmod 700 ~/lgtm-backup
  2. Export Helm release values.
    $ helm get values grafana --namespace monitoring --all \
      > ~/lgtm-backup/values/grafana.yaml
    $ helm get values loki --namespace monitoring --all \
      > ~/lgtm-backup/values/loki.yaml
    $ helm get values tempo --namespace monitoring --all \
      > ~/lgtm-backup/values/tempo.yaml
    $ helm get values mimir --namespace monitoring --all \
      > ~/lgtm-backup/values/mimir.yaml

    Review exported values before storing them. Some charts can expose secret references or inline credentials depending on how the release was installed.

  3. Record chart versions and revisions.
    $ helm list --namespace monitoring > ~/lgtm-backup/helm-list.txt
    $ helm history grafana --namespace monitoring > ~/lgtm-backup/grafana-history.txt
  4. Export Grafana data sources through the API.
    $ curl --silent --user admin:<password> \
      https://grafana.example.com/api/datasources \
      > ~/lgtm-backup/grafana/datasources.json
  5. Export Grafana dashboards through the approved backup mechanism.
    $ curl --silent --user admin:<password> \
      https://grafana.example.com/api/search \
      > ~/lgtm-backup/grafana/dashboard-index.json

    Use the organization's existing dashboard backup tool if one is already in place. The API index alone is not a full dashboard backup.

  6. Record object storage prefixes for each backend.
    $ aws s3 ls s3://lgtm-loki-prod/ --recursive
    2026-06-21 08:10:00      12000 chunks/tenant-a/...
    $ aws s3 ls s3://lgtm-tempo-prod/ --recursive
    2026-06-21 08:10:01      18000 traces/...
    $ aws s3 ls s3://lgtm-mimir-prod/blocks/ --recursive
    2026-06-21 08:10:02      22000 blocks/...
  7. Restore the backend releases into a clean namespace.
    $ kubectl create namespace monitoring-restore
    namespace/monitoring-restore created
    $ helm upgrade --install loki grafana/loki \
      --namespace monitoring-restore \
      --values ~/lgtm-backup/values/loki.yaml \
      --wait
  8. Restore Tempo and Mimir with their saved values.
    $ helm upgrade --install tempo grafana/tempo-distributed \
      --namespace monitoring-restore \
      --values ~/lgtm-backup/values/tempo.yaml \
      --wait
    $ helm upgrade --install mimir grafana/mimir-distributed \
      --namespace monitoring-restore \
      --values ~/lgtm-backup/values/mimir.yaml \
      --wait
  9. Restore Grafana after backend endpoints are reachable.
    $ helm upgrade --install grafana grafana/grafana \
      --namespace monitoring-restore \
      --values ~/lgtm-backup/values/grafana.yaml \
      --wait
  10. Check restored release state.
    $ helm list --namespace monitoring-restore
    NAME      NAMESPACE            STATUS
    grafana   monitoring-restore   deployed
    loki      monitoring-restore   deployed
    tempo     monitoring-restore   deployed
    mimir     monitoring-restore   deployed
  11. Query Grafana health in the restored namespace.
    $ curl --silent https://grafana-restore.example.com/api/health
    {"database":"ok","version":"13.0.1"}
  12. Run one restored data query for each backend.
    $ curl --silent --get https://metrics-restore.example.com/prometheus/api/v1/query \
      --data-urlencode 'query=up'
    {"status":"success","data":{"resultType":"vector","result":[]}}

    An empty vector proves the API responded, not that historical data exists. Use a known retained series, log stream, and trace ID for the final restore drill.