Glusterfs, tout en étant un beau système de fichiers distribués, fournit presque aucun moyen de surveiller son intégrité. Les serveurs peuvent aller et venir, les briques pourraient devenir obsolètes ou échouer et j'ai peur de savoir à ce sujet quand il est probablement trop tard.
Récemment, nous avons eu un échec étrange lorsque tout est apparu de travailler, mais une brique est tombée du volume (trouvé par pure coïncidence).
Existe-t-il d'une manière simple et fiable (script cron?) Qui me fera savoir sur l'état de santé de My Glusterfs .2 Volume?
Veuillez vérifier le script ci-joint à - https://www.gluster.org/pipermail/gluster-user/2012-june/010709.html pour GLUSTER 3.3; Il est probablement facilement adaptable à Gluster 3.2.
#!/bin/bash
# This Nagios script was written against version 3.3 of Gluster. Older
# versions will most likely not work at all with this monitoring script.
#
# Gluster currently requires elevated permissions to do anything. In order to
# accommodate this, you need to allow your Nagios user some additional
# permissions via Sudo. The line you want to add will look something like the
# following in /etc/sudoers (or something equivalent):
#
# Defaults:nagios !requiretty
# nagios ALL=(root) NOPASSWD:/usr/sbin/gluster peer status,/usr/sbin/gluster volume list,/usr/sbin/gluster volume heal [[\:graph\:]]* info
#
# That should give us all the access we need to check the status of any
# currently defined peers and volumes.
# define some variables
ME=$(basename -- $0)
Sudo="/usr/bin/Sudo"
PIDOF="/sbin/pidof"
GLUSTER="/usr/sbin/gluster"
PEERSTATUS="peer status"
VOLLIST="volume list"
VOLHEAL1="volume heal"
VOLHEAL2="info"
peererror=
volerror=
# check for commands
for cmd in $Sudo $PIDOF $GLUSTER; do
if [ ! -x "$cmd" ]; then
echo "$ME UNKNOWN - $cmd not found"
exit 3
fi
done
# check for glusterd (management daemon)
if ! $PIDOF glusterd &>/dev/null; then
echo "$ME CRITICAL - glusterd management daemon not running"
exit 2
fi
# check for glusterfsd (brick daemon)
if ! $PIDOF glusterfsd &>/dev/null; then
echo "$ME CRITICAL - glusterfsd brick daemon not running"
exit 2
fi
# get peer status
peerstatus="peers: "
for peer in $(Sudo $GLUSTER $PEERSTATUS | grep '^Hostname: ' | awk '{print $2}'); do
state=
state=$(Sudo $GLUSTER $PEERSTATUS | grep -A 2 "^Hostname: $peer$" | grep '^State: ' | sed -nre 's/.* \(([[:graph:]]+)\)$/\1/p')
if [ "$state" != "Connected" ]; then
peererror=1
fi
peerstatus+="$peer/$state "
done
# get volume status
volstatus="volumes: "
for vol in $(Sudo $GLUSTER $VOLLIST); do
thisvolerror=0
entries=
for entries in $(Sudo $GLUSTER $VOLHEAL1 $vol $VOLHEAL2 | grep '^Number of entries: ' | awk '{print $4}'); do
if [ "$entries" -gt 0 ]; then
volerror=1
let $((thisvolerror+=entries))
fi
done
volstatus+="$vol/$thisvolerror unsynchronized entries "
done
# drop extra space
peerstatus=${peerstatus:0:${#peerstatus}-1}
volstatus=${volstatus:0:${#volstatus}-1}
# set status according to whether any errors occurred
if [ "$peererror" ] || [ "$volerror" ]; then
status="CRITICAL"
else
status="OK"
fi
# actual Nagios output
echo "$ME $status $peerstatus $volstatus"
# exit with appropriate value
if [ "$peererror" ] || [ "$volerror" ]; then
exit 2
else
exit 0
fi
@Arie skliarouk, votre check_gluster.sh
a une typographie sur la dernière ligne, votre grep for exitst
au lieu de exist
. Je suis allé de l'avant et j'ai réécrit qu'il s'agit d'être un peu plus compact et de supprimer l'exigence d'un fichier temporaire.
#!/bin/bash
# Ensure that all peers are connected
gluster peer status | grep -q Disconnected && echo "Peer disconnected." && exit 1
# Ensure that all bricks have a running log file (i.e., are sending/receiving)
for vol in $(gluster volume list); do
for brick in $(gluster volume info "$vol" | awk '/^Brick[0-9]*:/ {print $2}'); do
gluster volume log locate "$vol" "$brick";
done;
done |
grep -qE "does not (exist|exitst)" &&
echo "Log file missing - $vol/$brick ." &&
exit 1