Using an ESP8266 to reset my hanging server

I recently encountered a rare, but recurring issue with my TrueNAS server. During resource-intensive operations like disk scrubbing, the server would sometimes hang, rendering it inaccessible over the network. The server is in a remote location so manually rebooting it every time this happened was not feasible. The server is built from leftover computer parts and does not have IPMI or any other management interfaces. This prompted me to explore alternative solutions to ensure that I wouldn't lose access to the server if it hanged.

I could have purchased a KVM, or even made one myself, but I wanted to keep costs down and re-use some components I already had laying around. I therefore opted for a simple solution using an ESP8266 and a relay connected to the motherboard's reset switch. I programmed the ESP to act as a watchdog timer that toggles the relay whenever the server is hanging.

Hardware

I connected the relay module to the ESP board, making sure to supply it with the 5V needed for the relays to switch:

A photo of the hardware

I then attached the ESP board to the relay board using some double sided tape and installed the whole thing inside my server's case. I connected the relay terminals to the reset pin on the motherboard, using a breakout cable:

Connecting to the reset switch on the motherboard

To power the ESP and allow communication with the server, I connected the ESP board to one of the motherboard's internal USB 2.0 headers using an adapter cable.

Software

I opted to use ESPHome for this project. ESPHome is an open-source project that allows for easy configuration and control of ESP8266 and ESP32 devices for Home Automation using YAML configuration files.

In only 50 lines of YAML, I was able to configure the ESP to monitor its USB serial port for incoming data and, if no data is received within 30 minutes, trigger the relay to reset the server:

The important parts of the configuration are the script: and uart: parts. The scripts starts with a 30 minute delay before it toggles the relay. However, the script is set to restart every time it is started (mode: restart) and the UART module is configured to start the script whenever data is received on the serial port (using the debug: feature and a lambda function). This means that as long as the server is regularly writing something to the serial port, the relay will never trigger. If the server is hanging and not writing to the serial port, the delay will eventually finish and the relay will be toggled, resetting the server.

Configuring the server

On the server side, I set up a cron job in the TrueNAS web interface to send data to the serial port every 10 minutes. This ensures that the ESP's delay never runs out and the relay is not toggled as long as the server is healthy.

Configuring a cron job in the TrueNAS web interface

Conclusion

By leveraging the ESP8266 microcontroller and the ESPHome platform, I was able to come up with a quick and easy solution to the problem without having to wait for parts or purchasing an expensive commercial product. The solution has been stable so far, and the server has only been reset when it was actually hanging (only happened once so far).