I'll move a questing that Peter asked via email here for others to see:
What are you using for alerting?
i.e. what are you using for alerts for performance issues or failures ontop of your prometheus and grafana stack? Is it something like alertmanager? What kind of things do you peeps look out for at SARAO?