A single remote syslog destination can turn a collector outage into missing security events, delayed troubleshooting data, or a blocked forwarding path. rsyslog can send to a primary collector first and use a secondary collector only after the primary forwarding action is suspended.
Failover forwarding depends on ordered actions. The first omfwd action tries the primary TCP receiver, and the next action uses action.execOnlyWhenPreviousIsSuspended so it runs only when the previous action has failed. Use TCP or RELP for this pattern because UDP forwarding cannot reliably prove that a remote collector has disappeared.
Keep the forwarding actions synchronous. Per-action queues make failover actions run asynchronously, which removes the failure feedback that the secondary action needs. If the forwarding pipeline needs buffering, put the actions inside a queued ruleset and leave the individual primary and secondary actions on their default Direct queue behavior.
$ sudo ss -ltn 'sport = :514' State Local Address:Port LISTEN 0.0.0.0:514
Run this check on each collector. The client-side failover rule can only prove a secondary path if both receivers are configured before the outage test starts.
$ sudo vi /etc/rsyslog.d/70-forwarding-failover.conf
global(workDirectory="/var/spool/rsyslog")
template(
name="ForwardPlainText"
type="string"
string="%timestamp% %hostname% %syslogtag%%msg%\n"
)
ruleset(
name="remoteFailover"
queue.type="LinkedList"
queue.filename="remote-failover"
queue.saveOnShutdown="on"
) {
action(
type="omfwd"
name="primary_collector"
target="syslog-primary.example.net"
port="514"
protocol="tcp"
template="ForwardPlainText"
action.resumeRetryCount="0"
action.reportSuspension="on"
)
action(
type="omfwd"
name="secondary_collector"
target="syslog-secondary.example.net"
port="514"
protocol="tcp"
template="ForwardPlainText"
action.execOnlyWhenPreviousIsSuspended="on"
action.resumeRetryCount="-1"
action.reportSuspension="on"
)
}
*.* call remoteFailover
The ruleset queue buffers the forwarding pipeline as a whole. Do not add queue.type, queue.filename, or other queue.* settings to the individual primary_collector or secondary_collector actions, because action queues prevent this failover test from seeing the previous action fail.
Use the existing global(workDirectory=…) path if the system already sets one in /etc/rsyslog.conf. The path must exist and be writable by rsyslog before a named queue can store files.
$ sudo rsyslogd -N1 rsyslogd: version 8.2512.0 ##### snipped ##### rsyslogd: End of config validation run. Bye.
$ sudo systemctl restart rsyslog
Restart after a clean syntax test. Some distro units do not expose a separate reload action for rsyslog.
Related: How to manage the syslog service
$ logger -t failover-test primary-path-ok
$ sudo cat /var/log/remote/client01.log Jun 5 01:13:42 client01 failover-test: primary-path-ok
The receiver path depends on how the collector stores remote hosts. Use the file, index, or SIEM search that normally receives messages from the test client.
$ sudo systemctl stop rsyslog
Run this on a disposable primary receiver or during an approved test window only. Stopping the collector interrupts syslog intake from every client that depends on that receiver.
$ logger -t failover-test secondary-path-probe $ logger -t failover-test secondary-path-ok
rsyslog may not detect an action failure before the same message reaches the next action. A probe followed by a second unique message avoids treating that detection boundary as a failed failover configuration.
$ sudo cat /var/log/remote/client01.log Jun 5 01:13:47 client01 failover-test: secondary-path-ok
$ sudo systemctl start rsyslog
$ logger -t failover-test primary-restored
The default action resume interval is 30 seconds unless it is changed in the action configuration. Wait at least one interval before expecting the primary action to take over again.
$ sudo cat /var/log/remote/client01.log Jun 5 01:13:42 client01 failover-test: primary-path-ok Jun 5 01:14:25 client01 failover-test: primary-restored
The primary log should receive the initial available-path message and the restored-path message. The secondary log should contain only the message sent while the primary collector was unavailable.