

- Get some sort of resource monitor running on the machine to collect timeseries data about your procs, preferably sent to another machine. Prometheus is simple enough, but SigNoz and Outrace are like DataDog alternatives if you want to go there.
- Identify what’s running out of control. Check CPU and Memory (most likely a memory leak)
- Check logs to see if something is obviously wrong
- Look and see if there is an update for whatever the proc is that addresses this issue
- If it’s a systems process, set proper limits
In general, it’s not an out of control CPU that’s going to halt your machine, it’s memory loss. If you have an out of control process taking too much memory, it should get OOMkilled by the kernel, but if you don’t have proper swap configured, and not enough memory, it may not have time to successfully prevent the machine from running out of memory and halting.




It’s not the CPU. All that will do is consume CPU and raise your energy bill.