dimanche 29 avril 2018

systemd reboot threshold limit

I'm working for a commercial product that runs a camera service. This service is critical for the normal functionality of the system. So far, it is going good and I'm able to restart the service if it fails due to low-level protocol/driver issues. Here is a snippet from .service unit file that deals with the service restart and reboot logic.

...
[service]
Restart=on-failure
StartLimitInterval=2min
StartLimitBurst=5
StartLimitAction=reboot-force
...

Under certain conditions (for example: bus rail faults), it is quite possible that any number of reboots wouldn't help recover the system. In this situation, we want to stop rebooting the device (as it could be annoying to the user) and stop all attempts to recover the camera pipelines. This can be achieved using a monitoring service that just keeps track of the number of reboots the device went through, before stopping further reboots.

The other option, I thought is to depend on systemd, instead of adding another monitoring service for this purpose alone (which in turn would be monitored by systemd). I have spent some time to look for systemd options, reading through the documentations/examples to see if such reboot-thresholds exist. I'm looking for a way to restrict the number of reboots to some configurable StartLimitReboot

tl;dr

I want to achieve something like this

...
[service]
... 
...
... 
StartLimitReboot=3 # stop rebooting after this limit
...

Looks like systemd doesn't support such a semantics as of now, but if it supports, that would simplify my task substantially.

Aucun commentaire:

Enregistrer un commentaire