Der Daemon mdmonitor kann hier helfen und eine Email versenden, wenn eine Festplatte ausfällt.
Betreibt man Nagios auf einer Maschine, so wäre es interessant den Status des Raidverbunds von Nagios überwachen zu lassen.
Auf der zu überwachenden Maschine(=Nagiosclient) benötigt man zuerst einen funktionierenden snmpd-Daemon.
Der snmpd-Dämon kann nun ein externdes Script ausführen und den Output des Scripts über snmp rausgeben.
Dazu wird auf dem Nagiosclient folgendes Script in das Verzeichnis /usr/share/snmp/exec installiert:
#!/usr/bin/env perl
# Get status of Linux software RAID for SNMP / Nagios
# Author: Michal Ludvig
# http://www.logix.cz/michal/devel/nagios
#
# Simple parser for /proc/mdstat that outputs status of all
# or some RAID devices. Possible results are OK and CRITICAL.
# It could eventually be extended to output WARNING result in
# case the array is being rebuilt or if there are still some
# spares remaining, but for now leave it as it is.
#
# To run the script remotely via SNMP daemon (net-snmp) add the
# following line to /etc/snmpd.conf:
#
# extend raid-md0 /root/parse-mdstat.pl --device=md0
#
# The script result will be available e.g. with command:
#
# snmpwalk -v2c -c public localhost .1.3.6.1.4.1.8072.1.3.2
use strict;
use Getopt::Long;
# Sample /proc/mdstat output:
#
# Personalities : [raid1] [raid5]
# md0 : active (read-only) raid1 sdc1[1]
# Get status of Linux software RAID for SNMP / Nagios
# Author: Michal Ludvig
# http://www.logix.cz/michal/devel/nagios
#
# Simple parser for /proc/mdstat that outputs status of all
# or some RAID devices. Possible results are OK and CRITICAL.
# It could eventually be extended to output WARNING result in
# case the array is being rebuilt or if there are still some
# spares remaining, but for now leave it as it is.
#
# To run the script remotely via SNMP daemon (net-snmp) add the
# following line to /etc/snmpd.conf:
#
# extend raid-md0 /root/parse-mdstat.pl --device=md0
#
# The script result will be available e.g. with command:
#
# snmpwalk -v2c -c public localhost .1.3.6.1.4.1.8072.1.3.2
use strict;
use Getopt::Long;
# Sample /proc/mdstat output:
#
# Personalities : [raid1] [raid5]
# md0 : active (read-only) raid1 sdc1[1]
# 2096384 blocks [2/1] [_U]
#
# md1 : active raid5 sdb3[2] sdb4[3] sdb2[4](F) sdb1[0] sdb5[5](S)
# 995712 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
# [=================>...] recovery = 86.0% (429796/497856) finish=0.0min speed=23877K/sec
#
# unused devices:
my $file = "/proc/mdstat";
my $device = "all";
# Get command line options.
GetOptions ('file=s' => \$file,
'device=s' => \$device,
'help' => sub { &usage() } );
## Strip leading "/dev/" from --device in case it has been given
$device =~ s/^\/dev\///;
## Return codes for Nagios
my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);
## This is a global return value - set to the worst result we get overall
my $retval = 0;
my (%active_devs, %failed_devs, %spare_devs);
open FILE, "< $file" or die "Can't open $file : $!";
while () {
next if ! /^(md\d+)+\s*:/;
next if $device ne "all" and $device ne $1;
my $dev = $1;
#
# md1 : active raid5 sdb3[2] sdb4[3] sdb2[4](F) sdb1[0] sdb5[5](S)
# 995712 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
# [=================>...] recovery = 86.0% (429796/497856) finish=0.0min speed=23877K/sec
#
# unused devices:
my $file = "/proc/mdstat";
my $device = "all";
# Get command line options.
GetOptions ('file=s' => \$file,
'device=s' => \$device,
'help' => sub { &usage() } );
## Strip leading "/dev/" from --device in case it has been given
$device =~ s/^\/dev\///;
## Return codes for Nagios
my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);
## This is a global return value - set to the worst result we get overall
my $retval = 0;
my (%active_devs, %failed_devs, %spare_devs);
open FILE, "< $file" or die "Can't open $file : $!";
while (
next if ! /^(md\d+)+\s*:/;
next if $device ne "all" and $device ne $1;
my $dev = $1;
for $_ (@array) {
next if ! /(\w+)\[\d+\](\(.\))*/;
if ($2 eq "(F)") {
$failed_devs{$dev} .= "$1,";
}
elsif ($2 eq "(S)") {
$spare_devs{$dev} .= "$1,";
}
else {
$active_devs{$dev} .= "$1,";
}
}
if (! defined($active_devs{$dev})) { $active_devs{$dev} = "none"; }
else { $active_devs{$dev} =~ s/,$//; }
if (! defined($spare_devs{$dev})) { $spare_devs{$dev} = "none"; }
else { $spare_devs{$dev} =~ s/,$//; }
if (! defined($failed_devs{$dev})) { $failed_devs{$dev} = "none"; }
else { $failed_devs{$dev} =~ s/,$//; }
$_ =
/\[(\d+)\/(\d+)\]\s+\[(.*)\]$/;
my $devs_total = $1;
my $devs_up = $2;
my $stat = $3;
my $result = "OK";
if ($devs_total > $devs_up or $failed_devs{$dev} ne "none") {
$result = "CRITICAL";
$retval = $ERRORS{"CRITICAL"};
}
print "$result - $dev [$stat] has $devs_up of $devs_total devices active (active=$active_devs{$dev} failed=$failed_devs{$dev} spare=$spare_devs{$dev})\n";
}
close FILE;
exit $retval;
# =====
sub usage()
{
printf("
Check status of Linux SW RAID
Author: Michal Ludvig
http://www.logix.cz/michal/devel/nagios
Usage: mdstat-parser.pl [options]
--file=
--device=
");
exit(1);
}
Anschließend fügt man am nagiosclient folgende Zeile ein in die Konfigurationsdatei des snmp-Daemons (üblicherweise die Datei /etc/snmp/snmpd.conf):
extend raid-md0 /usr/share/snmp/exec/nagios-linux-swraid.pl --device=md0
Der snmpd wird neu initialisiert - auf Redhat mit:
# service snmpd reload
Nun zurück zur Nagios-Maschine:
Dort wird das Shell-Script check_snmp_extend.sh in das Verzeichnis /usr/lib/nagios/plugins installiert mit folgendem Inhalt:
#!/bin/sh
# Nagios "check" for querying output of scripts
# from remote servers via SNMP "extend" mechanism.
#
# Author Michal Ludvig
# http://www.logix.cz/michal/devel/nagios
#
# Example configuration
# =====================
# for monitoring SW RAID arrays. Any other service
# that can be checked with a script can be monitored
# with this approach.
#
# Put the following lines into nagios' configuration:
#
# ---- cut here ----
# $USER10$=/usr/local/nagios/libexec.local
#
# define command{
# command_name check_snmp_extend
# command_line $USER10$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$
# }
#
# define service{
# use generic-service
# host_name server.domain
# service_description RAID status
# check_command check_snmp_extend!raid-md0
# }
# ---- cut here ----
#
# On the host server.domain configure SNMP extension
# with name "raid-md0".
# Configuration goes to /etc/snmp/snmpd.conf or similar.
#
# ---- cut here ----
# extend raid-md0 /usr/local/bin/nagios-linux-swraid.pl --device=md0
# ---- cut here ----
#
# That's all. Just note that older versions of
# Net-SNMP package did not support "extend" keyword.
# You will have to use "exec" with check_snmp_exec.sh
#
# Both check_snmp_exec.sh and nagios-linux-swraid.pl
# scripts are available from:
# http://www.logix.cz/michal/devel/nagios
#
# Enjoy!
# Michal Ludvig
. /usr/lib/nagios/plugins/utils.sh || exit 3
SNMPGET=$(which snmpget)
test -x ${SNMPGET} || exit $STATE_UNKNOWN
HOST=$1
shift
NAME=$1
shift
COMMUNITY=$1
test "${HOST}" -a "${NAME}" || exit $STATE_UNKNOWN
RESULT=$(snmpget -v2c -c ${COMMUNITY} -OvQ ${HOST} NET-SNMP-EXTEND-MIB::nsExtendOutputFull.\"${NAME}\" 2>&1)
STATUS=$(echo $RESULT | cut -d\ -f1)
case "$STATUS" in
OK|WARNING|CRITICAL|UNKNOWN)
RET=$(eval "echo \$STATE_$STATUS")
;;
*)
RET=$STATE_UNKNOWN
RESULT="UNKNOWN - SNMP returned unparsable status: $RESULT"
;;
esac
echo $RESULT
exit $RET
Nun wird das nagios-Kommando check_snmp_extend erzeugt indem man in die nagios-Konfiguration (z.B.: in die command.cfg) folgende Zeilen einfügt:
define command{
command_name check_snmp_extend
command_line $USER1$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$ $_HOSTSNMPCOMMUNITY$
}
command_name check_snmp_extend
command_line $USER1$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$ $_HOSTSNMPCOMMUNITY$
}
Nun kann man ein Service zur Überwachnung des Raidverbundes definieren:
define service{
use generic-service
host_name NAGIOSCLIENT
service_description RAID status md0
check_command check_snmp_extend!raid-md0
}
use generic-service
host_name NAGIOSCLIENT
service_description RAID status md0
check_command check_snmp_extend!raid-md0
}
define host{
use generic-linux
host_name IRGEND_EIN_HOSTNAME
alias XXXX
address IRGEND_EINE_IPADRESSE
_SNMPVERSION 1
_SNMPCOMMUNITY public
}
use generic-linux
host_name IRGEND_EIN_HOSTNAME
alias XXXX
address IRGEND_EINE_IPADRESSE
_SNMPVERSION 1
_SNMPCOMMUNITY public
}
Womit auch die Variable _SNMPCOMMUNITY erklärt wäre, die in der Kommandodefinition für nagios verwendet wird!
0 Kommentare:
Kommentar veröffentlichen