How to parse logs with grok in Logstash

Parsing log lines into named fields keeps searches, dashboards, and alert rules tied to the actual request attributes instead of brittle text fragments. Once the client address, request path, response code, and user agent live in separate fields, Elasticsearch queries and aggregations stay reliable even when traffic volume and log sources grow.

The grok filter matches text against reusable patterns and writes the captures back into the event. Current pattern files include ECS-aware HTTPD patterns such as HTTPD_COMBINEDLOG, which map a standard combined access-log line into nested fields like [source][address], [http][request][method], and [http][response][status_code].

Regex flexibility also makes grok more expensive and easier to drift than delimiter-based parsing. Explicit ecs_compatibility ⇒ “v8” keeps field names predictable, timeout_scope ⇒ “event” reduces per-pattern timeout overhead, and the parsed timestamp still needs the date filter if it should replace @timestamp. On package-based Linux installs, keep pipeline fragments under /etc/logstash/conf.d and test them before restarting the Logstash service.

Steps to parse logs with grok in Logstash:

Dry-run the pattern against a representative log line before editing the live pipeline.

$ printf '%s\n' '203.0.113.11 - - [07/Apr/2026:08:17:29 +0000] "GET /grok-demo HTTP/1.1" 200 512 "https://www.example.net/" "Mozilla/5.0"' | \
  sudo -u logstash /usr/share/logstash/bin/logstash \
  --path.settings /etc/logstash \
  --path.data /tmp/logstash-grok-dryrun \
  -e 'input { stdin {} }
filter {
  grok {
    id => "grok_apache_access"
    ecs_compatibility => "v8"
    match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    tag_on_failure => ["_grokparsefailure_apache"]
    timeout_scope => "event"
  }
}
output { stdout { codec => rubydebug { metadata => false } } }'
{
      "source" => {
        "address" => "203.0.113.11"
    },
        "http" => {
          "request" => {
             "method" => "GET",
           "referrer" => "https://www.example.net/"
          },
          "response" => {
            "status_code" => 200
          }
    },
         "url" => {
        "original" => "/grok-demo"
    }
}

Current pattern files prefer HTTPD_COMBINEDLOG. Older examples often use COMBINEDAPACHELOG, which is kept as a deprecated alias. Explicit ecs_compatibility ⇒ “v8” keeps the captured field names aligned with current ECS-style pipelines.

Tool: Filebeat Grok Pattern Generator

Add the grok filter to a dedicated pipeline fragment such as /etc/logstash/conf.d/50-grok.conf.

input {
  file {
    path => "/var/lib/logstash/examples/apache-access.log"
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb-grok-apache"
    tags => ["grok_demo"]
  }
}

filter {
  if "grok_demo" in [tags] {
    grok {
      id => "grok_apache_access"
      ecs_compatibility => "v8"
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
      tag_on_failure => ["_grokparsefailure_apache"]
      timeout_scope => "event"
    }
  }
}

output {
  if "grok_demo" in [tags] {
    elasticsearch {
      hosts => ["http://elasticsearch.example.net:9200"]
      index => "app-grok-%{+YYYY.MM.dd}"
    }
  }
}

The example stays scoped to one tagged input so the pattern only runs on the intended events. When the pipeline already has its own inputs and outputs, reuse only the grok block inside the relevant existing filter flow.

The ECS HTTPD pattern already types fields such as [http][response][status_code] and [http][response][body][bytes] as integers. Apply How to use the Logstash date filter if the parsed timestamp field should replace @timestamp.

Test the updated pipeline configuration with the packaged settings directory.

$ sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash --path.data /tmp/logstash-configtest --config.test_and_exit
Using bundled JDK: /usr/share/logstash/jdk
Configuration OK

Restart the Logstash service to load the updated pipeline.
```
$ sudo systemctl restart logstash
```
Restarting Logstash briefly pauses ingestion while the JVM reloads the pipeline.

Related: How to manage the Logstash service with systemctl in Linux

Query the monitoring API and confirm the named grok filter is receiving and emitting events.

$ curl --silent --show-error "http://localhost:9600/_node/stats/pipelines/main?pretty=true&filter_path=pipelines.main.plugins.filters.id,pipelines.main.plugins.filters.name,pipelines.main.plugins.filters.events"
{
  "pipelines" : {
    "main" : {
      "plugins" : {
        "filters" : [ {
          "id" : "grok_apache_access",
          "name" : "grok",
          "events" : {
            "in" : 1,
            "out" : 1
          }
        } ]
      }
    }
  }
}

If the pipeline ID is not main, replace it in the URL. When the API is bound to another host or secured with TLS or basic authentication, adjust the endpoint and credentials to match logstash.yml.

Search the destination index for parse-failure tags after the rollout.
```
$ curl -s -G "http://elasticsearch.example.net:9200/app-grok-*/_search" \
  --data-urlencode "q=tags:_grokparsefailure_apache" \
  --data-urlencode "size=0" \
  --data-urlencode "filter_path=hits.total" \
  --data-urlencode "pretty"
{
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    }
  }
}
```
A non-zero count means the pattern no longer matches at least some incoming lines. Split different line shapes with conditionals, tighten the pattern, or fall back to How to use the Logstash dissect filter when the format is fixed enough for delimiter-based parsing.

Fetch a recent document and confirm the parsed fields are present in the indexed event.

$ curl -s -G "http://elasticsearch.example.net:9200/app-grok-*/_search" \
  --data-urlencode "size=1" \
  --data-urlencode "sort=@timestamp:desc" \
  --data-urlencode "filter_path=hits.hits._index,hits.hits._source.timestamp,hits.hits._source.source.address,hits.hits._source.http.request.method,hits.hits._source.http.version,hits.hits._source.http.response.status_code,hits.hits._source.http.response.body.bytes,hits.hits._source.http.request.referrer,hits.hits._source.url.original,hits.hits._source.user_agent.original" \
  --data-urlencode "pretty"
{
  "hits" : {
    "hits" : [ {
      "_index" : "app-grok-2026.04.07",
      "_source" : {
        "timestamp" : "07/Apr/2026:08:17:29 +0000",
        "source" : {
          "address" : "203.0.113.11"
        },
        "http" : {
          "version" : "1.1",
          "request" : {
            "method" : "GET",
            "referrer" : "https://www.example.net/"
          },
          "response" : {
            "status_code" : 200,
            "body" : {
              "bytes" : 512
            }
          }
        },
        "url" : {
          "original" : "/grok-demo"
        },
        "user_agent" : {
          "original" : "Mozilla/5.0"
        }
      }
    } ]
  }
}

If the event still arrives with legacy field names such as clientip, verb, request, response, and bytes, the pipeline or this filter is still using legacy pattern captures instead of ECS mode.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.