Recently, one of our web servers fell victim to an apparent DoS attack, being hammered with hundreds of simultaneous dynamic page requests, far more than it’s specced to handle. To its credit, it stayed up, although it took about five minutes to log in via ssh, and when we spotted what was happening, the load average was over 100 which I think is the most loaded I’ve ever seen. The offending IP address was at UCSD who do a lot of bioinformatics, so there is a chance it was a misguided attempt to scrape a lot of data from us, rather than an actual hostile act, but in any case the upshot is the same.
It worried me that there was no easy way to remotely check the machine’s health, so I hacked together a quick PHP page to report various vital statistics on demand — load average, memory usage, disk usage etc. — and a Perl monitor that can raise the alarm if anything exceeds safe bounds.
The status script itself really is dead simple, it looks like this:
<?php # status.php -- very simple server status monitor header( 'Content-type: text/plain' ); # Get and display load average times 3 $load = sys_getloadavg(); echo "LoadAverage1: $load[0]\n"; echo "LoadAverage5: $load[1]\n"; echo "LoadAverage15: $load[2]\n"; # Get and display all sorts of memory usage info echo join( '', file( '/proc/meminfo' ) ); # Get and display disk usage percentages $df = `/bin/df`; foreach( split( "\n", $df ) as $line ) { if( preg_match( "/(\d+%)\s+(\S+)$/", $line, $matches ) ) { $fs = $matches[ 2 ]; $usage = $matches[ 1 ]; echo "Usage_$fs: $usage\n"; } } # Count running processes $procs = `/bin/ps -e|wc -l`; echo "RunningProcesses: $procs\n"; ?>
It uses PHP’s built-in sys_getloadavg function to return the load average, df to get the disk usage, and ps and wc to count the number of processes running. These should work on any Unix-ish system (let me know if they don’t!). Also, it uses the /proc filesystem to read lots of metrics about memory use, and this is Linux-specific. It produces output that looks like this:
LoadAverage1: 0.59 LoadAverage5: 0.4 LoadAverage15: 0.29 MemTotal: 2021816 kB MemFree: 337592 kB Buffers: 35408 kB Cached: 286676 kB SwapCached: 3152 kB Active: 1115608 kB Inactive: 102288 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 2021816 kB LowFree: 337592 kB SwapTotal: 4192956 kB SwapFree: 4185596 kB Dirty: 304 kB Writeback: 0 kB AnonPages: 893740 kB Mapped: 119900 kB Slab: 256036 kB PageTables: 21240 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 5203864 kB Committed_AS: 1433344 kB VmallocTotal: 34359738367 kB VmallocUsed: 280284 kB VmallocChunk: 34359458043 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Usage_/: 48% Usage_/var: 9% Usage_/tmp: 1% Usage_/dev/shm: 0% Usage_/export/people: 16% Usage_/home/bsm: 79% Usage_/LINUX/local64: 94% Usage_/cath/opt: 13% Usage_/cath/svnbin: 23% Usage_/nfs/mail: 82% Usage_/LINUX/local: 89% RunningProcesses: 149
The Perl monitoring script is a bit more complex, so I’ve made it available for download here. It lets you set up a config file with rules specifying named fields from the PHP script’s output, along with maximum and/or minimum allowable values for them. From the script’s comments:
#!/usr/bin/perl # server_status_check.pl # Andrew Clegg # # This script parses the output of status.php and compares it to a # list of minimum and maximum allowable values for server resources # specified in a config file. The config file contains one rule per # line, like so: # # min MemFree 7500000 # min Usage_/LINUX/local64 95 # max LoadAverage5 0.1 # # Any line not in this format causes an error. Do not include any # percent signs, units (e.g. kB) etc. in the config file; these # are automatically stripped out from the results of status.php # before applying the rules. # # For each resource that is lower than a min value or larger than # a max value, a warning is printed. Also, if the config file # contains any rules which name resources that are not found in # the output of status.php at all, a warning is printed for each. # # It returns 0 if everything is fine, 255 if an error occurred, or # the number of warnings issued if one or more of the resource # rules are violated.
You invoke the monitor script like this:
./server_status_check.pl http://my.server/status.php my.config.file
And it returns output that looks like this if anything rules from the config file are violated:
MemFree has value 229472 which is less than minimum 250000 Usage_/ has value 86 which is greater than maximum 50 SomeIncorrectVariableName not found in server status report
Since the return code is non-zero in case of a problem, you can easily use it in a cron job or shell script to take action when a server’s vital statistics move into dangerous ranges.
Of course, being PHP, you can use it from the command line for a quick summary of the local machine’s resources by just typing
php status.php
There are plenty more complex server monitoring tools out there, but you probably have to be a skilled sysadmin to use them, whereas these tools took a few hours to write, and five minutes to install. As usual, suggestions are welcome, and you are free to use them wherever and however you like, but please credit me and include a link back here.
Andrew.