Automation Actions from Log Insight Alerts using Perl SDK

In a previous post, I talked about how my Log Insight server was getting spammed by some hosts that have these ‘ipmi’ errors. The temporary fix is to remote into the systems and run some commands. My colleague also found out that some of the commands could be cleared with PowerCLI.

The script was great, but we were getting annoyed that we would get an email alert and then would have to then open PowerCLI in order remediate the issue.

I looked for ways to automate actions based off of Log Insight alerts and found two articles:

Log Insight Alerts SNMP Scripts
Steve demonstrates how to awk the alerts file to get the hostname and the title of the alert and then ssh into the host. I don’t think the tail method that Steve used actually works though (it didn’t for me), I figured out a modified syntax that seemed to constantly tail the file correctly. SSH’ing into the host would work for me, but I would have to setup some ssh keys and sometimes for new hosts the ssh thumbprint changes after a rebuild and messes up things.

Quick Post: Launching Custom Actions on LogInsight Alerts
Matt demonstrates how to parse the alert file using Python/pyvmomi

Both of these assume that you want to run the scripts on the Log Insight server and not Windows.
I thought that maybe I could setup a samba share on the LI server and have the Windows server scrape the alert log file.
Another idea was to setup a nfs share on the LI server and have a Linux server with the Perl SDK on it.
A third idea was to have a script on the LI server scrape the log file and then push the content via FTP.

All of the ideas were either not possible given what was available on the LI server or weren’t very secure (FTP).

The whole reason I thought of option 2 was that I thought that the Perl SDK was not available on the LI server, but surprise surprise it is! The pyvmomi may be available as well, but I wasn’t as familiar with where that was installed to (I did install the Perl SDK).

UPDATE 01-09-2015:
I was installing the scripts on my three Log Insight server and one server DID NOT have the Perl SDK. The ones that did have it were originally installed as 1.0 and upgraded to 1.5, 2.0 and then 2.5. The one that DID NOT have it was the server that was installed as 2.0 and upgraded to 2.5. I will probably install the SDK on that last server at some point but I have reached out to someone from the Log Insight team to see if it is supposed to be there as a standard (so I can count on it in the future).

My approach is a hybrid of the two articles above, I have a bash script that runs forever, continously tailing the alert file (this runs as a service). The bash script scrapes the correct info and then passes it to a customized version of the hostops.pl perl script that is included with the Perl SDK. Note that these scripts are pretty rough, but in general it works

First script is /etc/init.d/monitoripmi
This is needed to start and stop the service (I looked at /etc/init.d/skeleton and /etc/init.d/loginsight)

#!/bin/sh
#
# monitoripmi Start/stop monitorpmi script
# chkconfig: 2345 99 01
# description: Monitors VMware Log Insight Logs and if IPMI error is found, connects to host and clears it
#
### BEGIN INIT INFO
# Provides:          monitoripmi
# Required-Start:    Log Insight
# Should-Start:
# Required-Stop:
# Should-Stop:
# Default-Start:     2 3 5
# Default-Stop:      0 1 4 6
# Description:       Start/stop monitoripmi
### END INIT INFO
FOO_BIN=/usr/lib/vmware-vcli/apps/host/monitoripmi
test -x $FOO_BIN || { echo "$FOO_BIN not installed";
        if [ "$1" = "stop" ]; then exit 0;
        else exit 5; fi; }

. /etc/rc.status

# Reset status of this service
rc_reset


case "$1" in
    start)
        echo -n "Starting monitoripmi "
        ## Start daemon with startproc(8). If this fails
        ## the return value is set appropriately by startproc.
        /sbin/startproc $FOO_BIN

        # Remember status and be verbose
        rc_status -v
        ;;
    stop)
        echo -n "Shutting down monitoripmi "
        ## Stop daemon with killproc(8) and if this fails
        ## killproc sets the return value according to LSB.

        /sbin/killproc -TERM $FOO_BIN

        # Remember status and be verbose
        rc_status -v
        ;;
    try-restart|condrestart)
        ## Do a restart only if the service was active before.
        ## Note: try-restart is now part of LSB (as of 1.9).
        ## RH has a similar command named condrestart.
        if test "$1" = "condrestart"; then
                echo "${attn} Use try-restart ${done}(LSB)${attn} rather than condrestart ${warn}(RH)${norm}"
        fi
        $0 status
        if test $? = 0; then
                $0 restart
        else
                rc_reset        # Not running is not a failure.
        fi
        # Remember status and be quiet
        rc_status
        ;;
    restart)
        ## Stop the service and regardless of whether it was
        ## running or not, start it again.
        $0 stop
        $0 start

        # Remember status and be quiet
        rc_status
        ;;
    force-reload)
        ## Signal the daemon to reload its config. Most daemons
        ## do this on signal 1 (SIGHUP).
        ## If it does not support it, restart the service if it
        ## is running.

        echo -n "Reload service monitoripmi "
        ## if it supports it:
        /sbin/killproc -HUP $FOO_BIN
        #touch /var/run/FOO.pid
        rc_status -v

        ## Otherwise:
        #$0 try-restart
        #rc_status
        ;;
    reload)
        ## Like force-reload, but if daemon does not support
        ## signaling, do nothing (!)

        # If it supports signaling:
        echo -n "Reload service monitoripmi "
        /sbin/killproc -HUP $FOO_BIN
        #touch /var/run/FOO.pid
        rc_status -v

        ## Otherwise if it does not support reload:
        #rc_failed 3
        #rc_status -v
        ;;
    status)
        echo -n "Checking for service monitoripmi"
        ## Check status with checkproc(8), if process is running
        ## checkproc will return with exit status 0.

        # Return value is slightly different for the status command:
        # 0 - service up and running
        # 1 - service dead, but /var/run/  pid  file exists
        # 2 - service dead, but /var/lock/ lock file exists
        # 3 - service not running (unused)
        # 4 - service status unknown :-(
        # 5--199 reserved (5--99 LSB, 100--149 distro, 150--199 appl.)

        # NOTE: checkproc returns LSB compliant status values.
        /sbin/checkproc $FOO_BIN
        # NOTE: rc_status knows that we called this init script with
        # "status" option and adapts its messages accordingly.
        rc_status -v
        ;;
    *)
        echo "Usage: $0 {start|stop|status|try-restart|restart|force-reload|reload}"
        exit 1
        ;;
esac
rc_exit

Next script is the bash script that monitors the log file located in /usr/lib/vmware-vcli/apps/host/monitoripmi, note that I am matching the title of my alert.

#!/bin/bash
export HOME
HOME=/root

IFS='
'
tail -F  /storage/var/loginsight/alert.log | while read a
do
  if [ $(echo $a| awk '{ split($0,a,"hostname"); split(a[1],b,"\""); print b[6]}') == "Log Insight being Spammed by ipmiifcselreadentry" ]
  then
    servertoclear=$(echo $a | awk '{ split($0,a,"hostname"); split(a[2],b,"\""); print b[3]}')
    echo $servertoclear is spamming, will check if credentials exist 2>&1 >> /storage/var/loginsight/monitoripmi.log
    /usr/lib/vmware-vcli/apps/general/credstore_admin.pl list | grep $servertoclear > /dev/null
    if [ $? -eq 0 ]
    then
      echo Credentials exist for $servertoclear 2>&1 >> /storage/var/loginsight/monitoripmi.log
      echo Attempting to clear $servertoclear 2>&1 >> /storage/var/loginsight/monitoripmi.log
      /usr/lib/vmware-vcli/apps/host/hostops2.pl --operation clearipmi --target_host $servertoclear --server $servertoclear --username root 2>&1 >> /storage/var/loginsight/monitoripmi.log
      subject="Cleared IPMI on $servertoclear"
      from="loginsight@domain.com"
      recipients="person@domain.com"
      mail="subject:$subject\nfrom:$from\nIPMI cleared on $servertoclear"
    else
      echo "Credentials do not exist for $servertoclear" 2>&1 >> /storage/var/loginsight/monitoripmi.log
      subject="Cannot clear IPMI on $servertoclear"
      from="loginsight@domain.com"
      recipients="person@domain.com"
      mail="subject:$subject\nfrom:$from\nMissing credentials in credstore for $servertoclear"

    fi
    echo -e $mail | /usr/sbin/sendmail "$recipients"

  fi
done

Next is a snippet of what I added to hostops.pl (which I made a copy of and run as hostops2.pl)

#clear_ipmi
#
sub clear_ipmi {
   my $target_host = shift;
   my $host_name = Opts::get_option('target_host');
      $target_host = ($entity_view);
   my $target_host_healthstatussystem = Vim::get_view(mo_ref => $target_host->configManager->healthStatusSystem);
   my $target_host_servicesystem = Vim::get_view(mo_ref => $target_host->configManager->serviceSystem);
   my $services = $target_host_servicesystem->serviceInfo->service;
   my $service = "sfcbd-watchdog";
   eval {
     $target_host_servicesystem->RestartService(id => $service);
     Util::trace(0, "\nHost '$host_name' restarting sfcbd-watchdog\n");
       sleep 15;
     $target_host_healthstatussystem->ResetSystemHealthInfo();
     Util::trace(0, "\nHost '$host_name' resetting HW Sensors...\n");
        sleep 15;
     $target_host_healthstatussystem->RefreshHealthStatusSystem();
     Util::trace(0, "\nHost '$host_name' refreshing data...\n");
        sleep 15;
     $target_host_healthstatussystem->update_view_data();
     Util::trace(0, "\nHost '$host_name' refreshing data view...\n");
        sleep 15;
   };
   if ($@) {
      if (ref($@) eq 'SoapFault') {
         if (ref($@->detail) eq 'InvalidState') {
            Util::trace(0,"\n The operation is not allowed in the current state\n");
         }
         elsif (ref($@->detail) eq 'RuntimeFault') {
            Util::trace(0,"\nRuntime Fault\n");
         }
         else {
            Util::trace(0, "Error: "  . $@ . " ");
         }
      }
      else {
         Util::trace(0, "Error: "  . $@ . " ");
      }
   }
}

I ran into an issue with Perl not liking my self-signed certs, in order to get around this you have to add the following to disable SSL verification using LWP:

$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

One of the last things that you must do is to add the credentials for the Perl SDK to access your hosts with, repeat the following for all of the hosts that this script may touch:

/usr/lib/vmware-vcli/apps/general/credstore_admin.pl add -s HOSTNAME.FQDN -u root -p PASSWORD

To put this all together, you add monitpripmi as a service and then start it

# chkconfig --add monitoripmi
# service monitoripmi start

Note, The first command looks at /etc/init.d/monitoripmi

Note that this may re-process some alerts that have already occurred. The python scripts mentioned on one of the other links used a python file monitor module, and I’m sure there is one for Perl as well. My goal though was to use ONLY what was already on the LI server (note I had to use sendmail since there was no mail command).

Overall, this was a nice introduction for me to the Perl SDK and got me back to my Unix/Linux roots. Hopefully launching custom actions or some sort of remediation engine is in the future for the vROPS suite.

Latest Images

Trending Articles

Latest Images