Server Farmer provides many monitoring capabilities, using many different systems.
Heartbeat is a Server Farmer subproject (divided into client part, server part and sf-monitoring-heartbeat extension), that extends functionally your chosen monitoring/alerting solution by providing abilities to monitor:
- services listening on known ports
- running Docker containers
- running libvirt-based virtual machines
- SMART for local drives (also SAS and all drives connected through hardware RAID controllers)
- free space under critical directories (eg. /var/lib/mysql - directories are detected automatically)
- mounted LUKS encrypted drives (or just any device mapper based)
- custom conditions defined per monitored host
Heartbeat can work with any monitoring/alerting system, that supports http(s) keyword monitoring, including:
- public: StatusCake, Uptimerobot, Pingdom etc.
- local: Nagios, Icinga, Zabbix, PRTG etc.
See the above links for detailed documentation:
SNMP generic monitoring
sf-monitoring-snmpd extension provides automatic installation and configuration of snmpd daemon (net-snmp on RHEL), compatible with any SNMP monitoring system, at least with:
- PRTG Network Monitor
Apart from generic SNMP monitoring, sf-monitoring-cacti provides additional capabilities:
- OpenVZ and LXC containers monitoring (network traffic, memory and disk usage)
- MTA queue monitoring
- CPU thermal monitoring (compatible with generic lm-sensors + some hardware not compatible with lm-sensors, eg. QNAP, Fit-PC2, Raspberry Pi)
- external temperature monitoring using TemperNTC USB dongles
- SMART metrics monitoring
- possibility of swapping drives between servers without any configuration changes in Cacti
- automatic mapping of current drive letters to graphs (for systems with multiple hard drives and dynamic drive letter assignments)
These additional capabilities require ssh connectivity from monitored servers to Cacti server. Each monitored server has its own ssh private key, for which public key has to be manually accepted on Cacti server (and can be immediately rejected in case of security issues etc.).
sf-monitoring-newrelic extension provides NewRelic license code configuration.
sf-monitoring-mysql, sf-monitoring-smart and sf-monitoring-backup extensions provide several custom checks, integrating with the NewRelic platform using dedicated dashboards.
SMART drive health monitoring
Heartbeat subproject and sf-monitring-heartbeat extension together provide an universal SMART drive monitoring, which can report current drive health to 3 different targets from single SMART read:
- Heartbeat server
- Cacti (if sf-monitoring-cacti extension is also installed)
- NewRelic (if sf-monitoring-newrelic extension is also installed and NewRelic license key is configured)
Heartbeat automatically detects all local drives, even ones not supported by udev:
- SATA drives connected straight, or via USB or eSATA (including with port multiplier), or even as passthrough from hypervisor to virtual machine
- SATA/SAS drives connected to MegaRAID controller
- SATA/SAS drives connected to any custom hardware controller, assuming that such drives are exposed via /dev/sg* interfaces
Our health monitoring algorithm is based on our own experience from running own online backup business, and on knowledge provided by Backblaze and Google. You can find detailed description of tested SMART attributes and other details here.
External hard drive overheating prevention
Many people use external hard drives, or external hard drive enclosures (connected via USB or sometimes eSATA/Thunderbolt), that allow very cheap data storage for backup/archival purposes. Unfortunately such devices tend to overheat themselves, and eventually fail, when working continuously for too long. While most such devices have hardware SCT (standby condition timer) protection, it is active by default only in Windows, and only when using drivers provided by manufacturer.
sf-standby-monitor extension provides a simple mechanism that warns every 30 minutes, if there are USB-attacged drives, that are not in standby mode.
Disk usage monitoring
sf-farm-inspector extension provides several farm health analysis tools, including disk usage monitoring. This extends free disk space monitoring realized by any SNMP monitoring software, by providing you an insight, what exactly in your filesystem takes so much space.
Public IP change monitoring
For many companies, it is completely enough that their LAN is put behind NAT, with public IP address and remote access from outside, but where this public IP address isn't fixed. Especially when their Internet connection has very good quality and fixed IP address is expensive.
There already exist services for such companies, eg. noip.com, but they are slow and paid (or have restrictions in free mode).
sf-ip-monitor extension provides capability of monitoring public IP changes eg. every minute, and alerting about detected change using email and/or SMS messages. Emails sent by this extensions are easy to parse and process by any external system, that you may want to use, if you manage eg. hundreds of customers without fixed IP.
sf-sms-smsapi extension provides ability to send paid (cheap), prioritized SMS messages, that won't get lost in case of GSM network congestions etc.
Syslog events monitoring
sf-log-monitor provides configuration of logcheck tool, that scans syslog (and other) logs from your server (and possibly from other servers in the farm) and notifies the system administrator every hour about any unknown, possibly suspicious events. It is used to enhance the overall security level of your server/farm.