How to create a custom Nagios plugin

Custom Nagios Core plugins turn local checks, application metrics, and site-specific probes into scheduled service states. A plugin only has to print a status line to stdout and exit with the status code that represents OK, WARNING, CRITICAL, or UNKNOWN, which makes small shell scripts enough for many private checks.

Packaged and source-built Nagios Core systems can use different plugin and object-definition directories. Match the plugin path and object directory to the active nagios.cfg file before saving a command definition, because Nagios Core only loads files reached from that configuration tree.

A queue-depth check gives the plugin a concrete metric, two thresholds, status text, and optional performance data without depending on an external application. Replace the sample metric file with the file, socket, API call, or local command that represents the application condition on the monitoring server.

Steps to create a custom Nagios plugin:

Create the plugin file in the Nagios Core plugin directory.
```
$ sudoedit /usr/lib/nagios/plugins/check_queue_depth
```
Ubuntu and Debian package installs use /usr/lib/nagios/plugins. Use /usr/local/nagios/libexec on source installs that follow the upstream default layout.

Add the plugin script.

#!/bin/sh
 
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3
 
usage() {
    echo "UNKNOWN - usage: $0 --path FILE --warning N --critical N"
    exit "$UNKNOWN"
}
 
metric_path=
warning=
critical=
 
while [ "$#" -gt 0 ]; do
    case "$1" in
        --path)
            [ "$#" -ge 2 ] || usage
            metric_path=$2
            shift 2
            ;;
        --warning)
            [ "$#" -ge 2 ] || usage
            warning=$2
            shift 2
            ;;
        --critical)
            [ "$#" -ge 2 ] || usage
            critical=$2
            shift 2
            ;;
        *)
            usage
            ;;
    esac
done
 
if [ -z "$metric_path" ]; then
    usage
fi
 
case "$warning:$critical" in
    :*|*:|*[!0-9:]* ) usage ;;
esac
 
if [ "$warning" -ge "$critical" ]; then
    echo "UNKNOWN - warning threshold must be lower than critical threshold"
    exit "$UNKNOWN"
fi
 
if [ ! -r "$metric_path" ]; then
    echo "UNKNOWN - cannot read $metric_path"
    exit "$UNKNOWN"
fi
 
IFS= read -r queue_depth < "$metric_path" || queue_depth=
case "$queue_depth" in
    ""|*[!0-9]* )
        echo "UNKNOWN - $metric_path does not contain an integer"
        exit "$UNKNOWN"
        ;;
esac
 
if [ "$queue_depth" -ge "$critical" ]; then
    echo "CRITICAL - queue depth is $queue_depth | queue_depth=$queue_depth;$warning;$critical;0;"
    exit "$CRITICAL"
elif [ "$queue_depth" -ge "$warning" ]; then
    echo "WARNING - queue depth is $queue_depth | queue_depth=$queue_depth;$warning;$critical;0;"
    exit "$WARNING"
fi
 
echo "OK - queue depth is $queue_depth | queue_depth=$queue_depth;$warning;$critical;0;"
exit "$OK"

The text before | becomes the service status output. The text after | is optional performance data in the format label=value;warning;critical;minimum;maximum.

Make the plugin executable.

$ sudo chmod 0755 /usr/lib/nagios/plugins/check_queue_depth

Create a sample metric file for the plugin to read.

$ printf '12\n' | sudo tee /var/lib/nagios4/app-queue-depth
12

Make the sample metric readable by the nagios user.
```
$ sudo chown nagios:nagios /var/lib/nagios4/app-queue-depth
```
In production, point --path at the real metric source and keep credentials outside the plugin file when possible.
Run the plugin manually as the nagios user.
```
$ sudo -u nagios /usr/lib/nagios/plugins/check_queue_depth --path /var/lib/nagios4/app-queue-depth --warning 70 --critical 90
OK - queue depth is 12 | queue_depth=12;70;90;0;
```
Running as nagios catches file-permission, interpreter, and environment problems before the scheduler starts using the plugin.
Related: How to run a Nagios plugin manually
Confirm the OK exit code from the previous plugin run.
```
$ echo $?
0
```
Nagios Core maps exit code 0 to OK, 1 to WARNING, 2 to CRITICAL, and 3 to UNKNOWN.

Set the sample metric to a warning value.

$ printf '75\n' | sudo tee /var/lib/nagios4/app-queue-depth
75

Run the plugin to confirm the WARNING branch.

$ sudo -u nagios /usr/lib/nagios/plugins/check_queue_depth --path /var/lib/nagios4/app-queue-depth --warning 70 --critical 90
WARNING - queue depth is 75 | queue_depth=75;70;90;0;

Set the sample metric to a critical value.

$ printf '95\n' | sudo tee /var/lib/nagios4/app-queue-depth
95

Run the plugin to confirm the CRITICAL branch.

$ sudo -u nagios /usr/lib/nagios/plugins/check_queue_depth --path /var/lib/nagios4/app-queue-depth --warning 70 --critical 90
CRITICAL - queue depth is 95 | queue_depth=95;70;90;0;

Set the sample metric to an invalid value.

$ printf 'busy\n' | sudo tee /var/lib/nagios4/app-queue-depth
busy

Run the plugin to confirm the UNKNOWN branch.

$ sudo -u nagios /usr/lib/nagios/plugins/check_queue_depth --path /var/lib/nagios4/app-queue-depth --warning 70 --critical 90
UNKNOWN - /var/lib/nagios4/app-queue-depth does not contain an integer

UNKNOWN is for invalid arguments, unreadable input, malformed local data, or an internal plugin failure that prevents a meaningful check result.

Reset the sample metric to an OK value for the scheduled service check.
```
$ printf '12\n' | sudo tee /var/lib/nagios4/app-queue-depth
12
```
Create a local object file for the command and service.
```
$ sudoedit /etc/nagios4/conf.d/queue-depth.cfg
```

Add the command and service definitions.

define command {
    command_name    check_queue_depth
    command_line    $USER1$/check_queue_depth --path /var/lib/nagios4/app-queue-depth --warning $ARG1$ --critical $ARG2$
}

define service {
    use                    generic-service
    host_name              localhost
    service_description    Queue Depth
    check_command          check_queue_depth!70!90
}

$ARG1$ and $ARG2$ come from the bang-separated values in check_command. Replace localhost with the host object that should own the custom service.
Related: How to add a service check in Nagios Core
Related: How to use Nagios Core macros in a command

Validate the Nagios Core configuration.

$ sudo nagios4 -v /etc/nagios4/nagios.cfg
Nagios Core 4.4.6
##### snipped #####
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...
##### snipped #####
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Do not reload Nagios Core while Total Errors is greater than 0. Fix the first reported object or command error, then run the verifier again.

Reload Nagios Core to load the new command and service objects.
```
$ sudo systemctl reload nagios4
```
Ubuntu and Debian package installs use the nagios4 service name. Containers without systemd can send HUP to the running nagios4 process or use the control method supplied by the image.
Related: How to manage the Nagios Core system service
Check the service result in Nagios Core.
```
Queue Depth
Current Status: OK
Status Information: OK - queue depth is 12
Performance Data: queue_depth=12;70;90;0;
```
The Queue Depth service should leave PENDING after its next active check. Force one service check if the scheduler has not run it yet.
Related: How to reschedule an active check in Nagios Core

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.