代码之家  ›  专栏  ›  技术社区  ›  Steve Proschenko

bash中的Nagios事件处理程序脚本,用于重新启动服务,如果服务未启动,则在满足条件之前不要重新启动下一个服务

  •  0
  • Steve Proschenko  · 技术社区  · 6 年前

    嗨,Stackoverflow社区,

    我需要一个bash脚本的帮助,因为我是新手。 我想实现的是,我们有一个windows服务器,有时它会占用90%的内存,所以每当nagios捕获它时,我们都希望通过nrpe重新启动这些服务。但在重新启动所有服务之前,必须先启动第一个服务,一旦启动,就继续下一个服务重新启动。

    另一种选择是停止所有4个服务,然后依次启动它们。

    下面是我写的脚本:

    case "$1" in
    OK)
    ;;
    WARNING)
    ;;
    UNKNOWN)
    ;;
    CRITICAL) ## DECISION ENGINE RESTART
    echo -n "Restarting Decision Engine_1"
    cat /usr/local/nagios/libexec/mail/DeServiceRestart.txt | mail -s "Restarting DE services" email@someteam.com -r Nagios@ATL-NM-01
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_1;
    if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_1 'crit=not state_is_ok()' > OK:
    then
    echo -n "Restarting Decision Engine_2"
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_2
    if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_2 'crit=not state_is_ok()' > OK:
    then
    echo -n "Restarting Decision Engine_3"
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_3
    if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_3 'crit=not state_is_ok()' > OK:
    then
    echo -n "Restarting Decision Engine_4"
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_4
    else
       echo " Restart is complete"
    fi
    ;;
    esac
    exit 0
    

    不知道我在哪里犯了错误,如果有任何反馈,我将不胜感激。

    谢谢

    1 回复  |  直到 6 年前
        1
  •  0
  •   Sasha Golikov    6 年前

    所有注释均为代码。仔细检查StopService函数,因为您没有提到如何停止服务,所以我做了类似的操作。

    #!/bin/bash
    
    SERVICESTATE=$1;      #Common Check State (OK,WARNING,CRITICAL or UNKNOWN)
    Host=$2;              #HostName or IP
    SERVICESTATETYPE=$3;  #Hard or Soft service type
    
    TimeOut=3;            #Time (seconds) to wait service start/stop 
                          #before next service processing
                          #You could not make infinite TimeOut, because 
                          #nagios process will kill this handler if it 
                          #will run too long
    
    
    #Services is array with service names
    Services=(DecisionEngine_1 DecisionEngine_2 DecisionEngine_3 DecisionEngine_4)
    
    #add path to nagios plugins dir
    PATH=$PATH:/usr/local/nagios/libexec
    
    RestartService() {
       #function restarts services via NRPE.
       #Usage:  RestartService ServiceName
       echo -n " Restarting $1;"
       check_nrpe -H "${Host}" -p 5666 -c restart_service -a "$1" >/dev/null 2>&1
       return $?
    }
    
    StopService() {
       #function stops services via NRPE.
       #Usage: StopService ServiceName
       echo -n " Stopping $1;"
       check_nrpe -H "${Host}" -p 5666 -c stop_service -a "$1" >/dev/null 2>&1
       return $?
    }
    
    ServiceWait() {
       #function do continious checks service via NRPE, until success,
       #unsuccess check or TimeOut 
       #Usage:  ServiceWait ServiceName {start|stop}
       #start optin waits for success check
       #stop option waits for unsuccess check
       Logic="";
       [ "$2" == "start" ] && Logic="-eq"; #RC for start check should be 0
       [ "$2" == "stop" ] && Logic="-ne" ; #RC for stop check should NOT be 0
       [ -z "$Logic" ] && { echo "ServiceWait function usage error"; exit 19; }
       t=${TimeOut}
       while [ "$t" -ge 0 ]; do
          check_nrpe -H "${Host}" -p 5666 -t 30 \
                     -c check_service -a "$1" 'crit=not state_is_ok()' >/dev/null 2>&1
          RC=$?
          [ "$RC" $Logic 0 ] && { echo -n "CheckRC=$RC;"; return $RC; }      
                                  #success check, no need to wait anymore
          let t--
          sleep 1
       done
       echo -n "TimeOut; " 
       return 3
    }
    
    #check if script received zero params in $1, $2 and $3
    [ -z "${SERVICESTATE}" -o -z "${Host}" -o -z "${SERVICESTATETYPE}" ] && { 
        echo "Usage: $0 {OK|WARNING|UNKNOWN|CRITICAL} Hostname {SOFT|HARD}"; 
        exit 1; 
      }
    
    case "${SERVICESTATE}" in
       OK)
       ;;
       WARNING)
       ;;
       UNKNOWN)
       ;;
       CRITICAL) ## DECISION ENGINE RESTART
         #uncomment if you need @mail
         #cat /usr/local/nagios/libexec/mail/DeServiceRestart.txt | \
         # mail -s "Restarting DE services" email@someteam.com -r Nagios@ATL-NM-01
         RC=0
    
         if [ "$SERVICESTATETYPE" == "SOFT" ] ; then
            for (( i=0; i<${#Services[*]}; i++ )); do
               RestartService ${Services[$i]}
               ServiceWait ${Services[$i]} start
               RC=$?
               #if previous check failed, then do not try to do any restarts anymore
               [ "$RC" -ne 0 ] && break;         
               SuccessRestart+=(${Services[$i]})
            done
            echo "Restart is complete. ${SuccessRestart[*]} Return Code is ${RC}"
         elif [ "$SERVICESTATETYPE" == "HARD" ] ; then
            #Stop all services sequentially.
            for (( i=0; i<${#Services[*]}; i++ )); do
               StopService ${Services[$i]}
               #Here you need to experiment what to wait
               #May be it will be better to stay here for N seconds while
               #service is been stopped
               #rather then try to check service state
               ServiceWait ${Services[$i]} stop
               #sleep $TimeOut
            done
            #Start all services sequentially.
            for (( i=0; i<${#Services[*]}; i++ )); do
               RestartService ${Services[$i]}
               ServiceWait ${Services[$i]} start
               RC=$?
               #if previous check failed, then do not try to do any restarts anymore
               [ "$RC" -ne 0 ] && break;      
               SuccessRestart+=(${Services[$i]})
            done
         else
             echo "Unknown SERVICESTATETYPE $SERVICESTATETYPE option" 
             exit 20
         fi
       ;;
    esac
    exit 0