A Perl Script For Monitoring Apache Server Status

By Lincoln D. Stein

Web servers rarely occur singly. They tend to run in packs. If you're a typical Webmaster, you're not responsible for just one server, but for a whole herd. It's easy enough to manage one or two servers, but when there are a half-dozen or more, you really need some sort of central server-management tool to keep track of what's going on and alert you to problems. Ideally, such a tool would run at regular intervals, contacting each of the servers under your care, logging the results, and alerting you if it detects that a server is no longer responding to requests.

There are a variety of commercial tools for this purpose. The ones that readers of this column most often recommend include WebTrends Log Analyzer, which uses FTP and server-side scripts to collect server performance statistics from log files, and WindDance Networks WebChallenger, which uses packet sniffing to pull its information directly off the local area network.

However, I've always done these sorts of things myself on the theory that the tools you create yourself are most likely to meet your evolving needs. In this column, I present a short Perl script that I use to collect usage and status information from the remote and local Apache Web servers that I manage. With some modifications, this script can be turned into a full-featured Web analysis tool.

The Apache status Module

Versions 1.2 and later of the Apache Web server come with an optional extension module called mod_status. When mod_status is installed, you can fetch a Web page of dynamically generated server performance statistics by fetching a special URL. Various options let you increase the verbosity of the report, or to toggle between human-readable (HTML) and machine-readable (text only) formats. Figure 1 shows an HTML report in verbose mode. The status data includes the length of time the server has been running, the host machine's load level, the total number of accesses, the number of bytes transferred, and the average number of bytes transferred with each request. Because these usage statistics get reset every time the server is restarted (which happens every time you make a configuration change), these statistics are no substitute for log crunching, but instead are intended to give you a snapshot of the server's current load and performance.

Before using the module, you must associate it with a URL. You should also put the URL under access control so that only authorized host machines can connect and fetch status information -- otherwise everyone in the world could view the status of your server and see what URL requests are currently being serviced. A typical excerpt from the server access.conf configuration file is shown below:

ExtendedStatus On

<Location /status>

SetHandler server-status

order deny,allow

deny from all

allow from localhost .yourdomain.com

</Location>

This configuration turns on "extended" status reports, which provide more verbose summaries. It then associates the status report module with the URL path "/status", and limits access to this URL to requests from the machine "localhost" and the domain ".yourdomain.com". Provided that the request is made by one of the machines satisfying these requirements, fetching http://www.yourdomain.com/status will now produce the human-readable report. Appending the query string ?auto to this URL will result in a text-only version of the report similar to what is shown in Example 1. The next section develops a Perl script to fetch and parse the text-only report information.

The check_servers Script

This is a short Perl script that can be used to monitor a series of Apache Web servers. Run at regular intervals, it attempts to contact each Web server in turn. If successful, it fetches its status report, parses it, and writes the results out to a chronological log file. If unsuccessful, it sends an email message to the Webmaster to inform him or her of the problem.

Listing One shows the code for the program. It first brings in a number of library modules that it needs. Some are standard in the Perl distribution, and others are available from the Comprehensive Perl Archive Network (CPAN). The use strict command is a pragma that turns on strict syntax checking. IO::File provides object-oriented file access. LWP::Simple lets Perl act like a Web client and fetch Web pages. Net::SMTP provides an interface to email so that the script can send out warning messages. POSIX provides time and date formatting utilities. Lastly, Getopt::Long provides routines for processing command-line options. ( Also see " Online").

Lines 11 through 15 define variables that are visible to all parts of the script. DEFAULT_LOG and DEFAULT_URL are constants containing various runtime defaults. $SRVR_LIST, $ALERT, $LOG, and $URL are configuration variables set from the command line. %STATUS is a global that will be used to contain the parsed status reports from each server, and $TIME holds the current time and date (fetched using the POSIX strftime() function). %STATUS and $TIME are global so that they can be successfully processed by Perl's report-generation facility, which can't operate with local variables.

Everything through to line 40 is concerned with fetching and parsing command-line arguments. You can pass the script a list of servers to check on the command line, or you can place the list of servers in a text file, one per line, and point the script to the file using the -servers command-line option. Servers can be specified by hostname alone, or by a combination of hostname and port in the format www.someserver.com:8000. Other options let you change the URL that the script uses to fetch the status report, to specify the directory in which to store the log files, and to give the email address of the person to be alerted in case of trouble. For example, if you wanted to keep the server status log files in the directory /var/adm/www/logs, to send email to webmaster@yourdomain.com, and to have the script read the list of servers to check from the file /etc/wwwservers you would invoke a script like this:

check_servers \

-servers /etc/wwwservers \

-log /var/adm/www/logs \

-alert webmaster@yourdomain.com

After parsing the command-line options and setting up its runtime configuration, the script checks whether it needs to read the servers from a file (lines 42 to 46). If so, it opens up the file and reads them, adding the servers found there to the list provided on the command line, if any.

Next (lines 48 to 52), the script enters a short loop. For each server on the list, it attempts to fetch the status report URL, using the LWP library's get() function to connect to the remote server and retrieve the status report, which is then saved to a local variable named $content.

If the server is down, $content will be empty. If the server is up but misbehaving, then $content may contain an error report of some sort. In either case, we attempt to parse the fields and values of the status report using a Perl pattern-match operation (line 50), and save the results into the global %STATUS hash. The hash is now passed to the write_log() routine to write out a log entry. If something goes wrong during the fetch, then %STATUS won't contain the values we expect. We check for a field that should always exist, the one named Total Accesses. If it's not present, we call the send_alert() function to alert the Webmaster via email.

The write_log() function (lines 55 to 72) first attempts to open a log file. The log file is located in the directory indicated by the -log command-line option, and is given the same name as the server it monitors. Using IO::File, the script attempts to open the log file for appending. If unsuccessful, it exits with an error message. If the write_log() function is successful in opening the log file, its next task is to write out a new log entry. There are two cases it must deal with. In the first case, the server couldn't be reached and %STATUS will not contain the expected Total Accesses field. In this case, the routine logs the time and date and writes out the line ** SERVER UNREACHABLE **. Otherwise it uses Perl's report-generating system to write out a nicely formatted log entry (see Example 2).

The send_alert() function (lines 74 to 94) is also straightforward. It uses the Net::SMTP module to create and send a new email message addressed to the user specified by the -alert command-line option. The message gives the time and date at which the problem occurred, and indicates which server couldn't be reached.

The end of the file contains the report format definition for the status log files. The format named LOG_TOP contains the headings for the table. The write_log() function arranges things so that the headings are printed only the first time a log file is created. The format named LOG contains a "picture line" for each entry, followed by the names of the variables to write out.

This script is intended to be run at periodic intervals using the UNIX cron facility or the Windows at command. I run it every hour on the hour. You might want to poll your servers more frequently.

Embellishments

This script is intended to be a template upon which you can build a version customized for your particular needs. Embellishments include:

* Having the script page you when a server is down by sending email to one of the integrated email/paging services.

* Attempting to restart a downed local server.

* Writing the summary information into a relational database for use in generating weekly server reliability reports.

* Turning the status logs into online HTML pages.

* Making the entries into links that point to the captured HTML output from the human-readable version of the Apache status report.

If you find ways to enhance this script, please share them with the community by sending your modifications back to me. I'll post them on my Web site, and maybe talk about them in a future column.

Example 1

Total Accesses: 821

Total kBytes: 17021

CPULoad: .305991

Uptime: 19762

ReqPerSec: .0415444

BytesPerSec: 881.971

BytesPerReq: 21229.6

BusyServers: 3

IdleServers: 8

Scoreboard: ______W_W_W..........

Example 2

Date Requests KB Load Uptime R/sec B/sec B/req Busy Idle

------------------------------------------------------------------------------------

15-Nov-1998 07:00 143 90 0.22 348 0.41 260.5 658 1 8

15-Nov-1998 08:00 271 4700 0.12 1948 0.12 271.1 17825 3 6

15-Nov-1998 09:00 1410 35660 0.20 5161 0.32 7078.5 25898 4 7

15-Nov-1998 10:00 ** SERVER UNREACHABLE **

15-Nov-1998 11:00 ** SERVER UNREACHABLE **

15-Nov-1998 12:00 1648 39430 0.29 5538 0.38 7291.1 24620 2 9

15-Nov-1998 13:00 8003 168510 0.31 19222 0.42 8974.7 21569 4 8

15-Nov-1998 14:00 8423 178180 0.31 20119 0.44 9061.9 21669 5 7

 

Listing 1

0 #!/usr/local/bin/perl

1 # file: check_servers

2 # author: Lincoln Stein

3

4 use strict;

5 use IO::File;

6 use LWP::Simple;

7 use Net::SMTP;

8 use POSIX 'strftime';

9 use Getopt::Long;

10

11 use constant DEFAULT_LOG => '/var/log/www/stats';

12 use constant DEFAULT_URL => '/status?auto';

13

14 my ($SRVR_LIST,$ALERT,$LOG,$URL,%STATUS);

15 my $TIME = strftime("%d-%b-%Y %H:%M",localtime);

16

17 GetOptions (

18 'servers=s' => \$SRVR_LIST,

19 'alert=s' => \$ALERT,

20 'log=s' => \$LOG,

21 'url=s' => \$URL,

22 ) || die <<USAGE;

23 Usage: $0 [options] [servers...]

24 Check the status of a list of Web servers.

25 Options:

26 -servers <path> File containing list of servers

27 -alert <address> E-mail address for server down warnings

28 -log <path> Directory to log status reports to

29 -url <url> URL to fetch to check status

30 USAGE

31 ;

32

33 # set up defaults

34 $LOG ||= DEFAULT_LOG;

35 $ALERT ||= $ENV{USER};

36 $URL ||= DEFAULT_URL;

37

38 # get list of servers to fetch

39 my @SERVERS = @ARGV;

40

41 # if -servers was specified, read server names from a file

42 if ($SRVR_LIST) {

43 my $fh = IO::File->new($SRVR_LIST) || die "Can't open $SRVR_LIST: $!";

44 chomp(my @servers = <$fh>);

45 push(@SERVERS,@servers);

46 }

47

48 foreach my $server (@SERVERS) {

49 my $content = get("http://$server$URL");

50 %STATUS = $content =~ /^(.+): ([\d.Ee-]+)$/mg;

51 write_log($server);

52 send_alert($server) unless exists $STATUS{'Total Accesses'};

53 }

54

55 sub write_log {

56 my ($server) = @_;

57 my $logfile = "$LOG/$server";

58 my $exists = -e $logfile;

59 my $fh = IO::File->new(">>$logfile");

60 die "can't open $logfile for appending: $!" unless $fh;

61 unless (exists $STATUS{'Total Accesses'}) {

62 print $fh $TIME,"\t** SERVER UNREACHABLE **\n";

63 return;

64 }

65 # these lines control the format production.

66 select $fh; # select the log file

67 $^ = 'LOG_TOP'; # set the top of form text

68 $~ = 'LOG'; # set the format for the body

69 $- = 100 if $exists; # inhibit header except for first time called

70 write;

71 $fh->close;

72 }

73

74 sub send_alert {

75 my $server = shift;

76 chomp(my $hostname = `hostname -d`);

77 my $smtp = Net::SMTP->new('localhost',Hello=>$hostname);

78 $smtp->mail($ENV{USER});

79 $smtp->to($ALERT);

80 $smtp->data();

81 $smtp->datasend(<<END);

82 From: "check servers program" <$ENV{USER}\@$hostname>

83 To: $ALERT

84 Subject: $server is unreachable

85

86 At $TIME the check_servers program tried to contact the

87 Web server named "$server", but there was no response.

88

89 Yours truly,

90 The check_servers program

91 END

92 $smtp->dataend;

93 $smtp->quit;

94 }

95

96 format LOG_TOP=

97 Date Requests kB Load Uptime R/sec B/sec B/req Busy Idle

98 .

99 format LOG=

100 @<<<<<<<<<<<<<<<< @###### @####### @#.## @####### @#.## @###.# @#### @## @###

101 { $TIME,@STATUS{'Total Accesses','Total kBytes',

102 'CPULoad','Uptime','ReqPerSec',

103 'BytesPerSec','BytesPerReq',

104 'BusyServers','IdleServers'}

105 }

106 .