How to skip CSV header rows in Logstash

Skipping the header row keeps CSV column names out of the destination index, which prevents misleading hits, broken aggregations, and one-off documents whose values are only the field labels.

The csv filter can learn field names from the first row it sees when autodetect_column_names is enabled, and skip_header then drops rows that exactly match that header instead of indexing them as events. Subsequent rows are parsed with the detected column names, so this works best when one filter instance handles one CSV schema.

Current Elastic documentation still requires the pipeline that runs this csv filter to use a single worker for header autodetection and skipping to behave correctly. With the file input, start_position only affects files that do not already have a recorded sincedb offset, so a file that has already been read may need a new filename or a carefully reset sincedb_path before Logstash sees the header again.

Steps to skip CSV header rows in Logstash:

Update the csv filter in /etc/logstash/conf.d/20-csv.conf to autodetect the header and skip it.
```
filter {
  csv {
    autodetect_column_names => true
    skip_header => true
  }
}
```
If you enable skip_header without autodetect_column_names, define columns explicitly. Repeated rows that exactly match the configured or autodetected header values are skipped too.

The first event seen by this filter becomes the header definition. If the pipeline reads multiple CSV layouts, keep each schema in its own filter instance or dedicated pipeline.

With the file input, start_position only affects first contact for files without an existing sincedb record. If Logstash has already advanced past the header, ingest a new file or reset that input's sincedb state carefully before expecting the header row to be skipped.
Set pipeline.workers to 1 for the pipeline that runs this filter. For the default package-based main pipeline, add it to /etc/logstash/logstash.yml.
```
pipeline.workers: 1
```
If this CSV flow has its own entry in /etc/logstash/pipelines.yml, set pipeline.workers there instead of lowering workers for every pipeline on the host.

Reducing workers for the default main pipeline can lower throughput for unrelated pipelines on the same Logstash instance.
Test the pipeline configuration before restarting the service.
```
$ sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash --path.data /tmp/logstash-configtest --config.test_and_exit
Using bundled JDK: /usr/share/logstash/jdk
Configuration OK
```
Run the test as the logstash user so the command uses the same permissions model as the service.

Related: How to test a Logstash pipeline configuration
Restart the Logstash service to apply the updated pipeline settings.
```
$ sudo systemctl restart logstash
```
Restarting Logstash briefly interrupts ingestion while the pipeline reloads.

Related: How to manage the Logstash service with systemctl in Linux

Check the Logstash service status for a running state.

$ sudo systemctl --no-pager status logstash
● logstash.service - logstash
     Loaded: loaded (/usr/lib/systemd/system/logstash.service; enabled; preset: enabled)
     Active: active (running) since Tue 2026-04-07 08:18:42 UTC; 6s ago
##### snipped #####

Verify that the header values are no longer indexed.
```
$ curl -sG 'http://elasticsearch.example.net:9200/users-*/_count?pretty' \
  --data-urlencode 'q=name:"name" AND email:"email" AND role:"role"'
{
  "count" : 0,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
```
Replace the index pattern and the field/value pairs with the actual header labels from the CSV being ingested.

Header skipping only affects future events. Delete or reindex any previously ingested header documents separately if they already exist.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.