Permanent Hardware Error Notification Method
Described here is a technique used by TXU on the AIX systems to cause
the systems to automatically notify the enterprise error logging systems
in the event of hardware errors. This error notification technique may
contain several mechanisms by which notification is performed, but is
fed by the single error logging system built into AIX.
To configure the AIX Error Logging system, perform the following
steps.
- Create a file called "
/tmp/hardware.add " containing
the following Error Notification object definition:
errnotify:
en_type = PERM
en_class = "H"
en_method = "/usr/sbin/errnotify.ksh $1 $2 $3 $4 $5 $6 $7 $8 $9"
To add the object to the Error Notification object class, run the
command:
odmadd /tmp/hardware.add
The "odmadd " command adds the Error Notification object
contained in "/tmp/hardware.add " to the errnotify file.
- To verify that the Error Notification object was added to the object
class, enter:
odmget -q"en_class='H' and en_type='PERM' and en_method='/usr/sbin/errnotify.ksh \$1 \$2 \$3 \$4 \$5 \$6 \$7 \$8 \$9'" errnotify
The "odmget " command locates the Error Notification
object within the ODM that has an en_class value of "H ", an
en_type of "PERM ", and an en_method of
"/usr/sbin/errnotify.ksh \$1 \$2 \$3 \$4 \$5 \$6 \$7 \$8
\$9 ", and displays the object. The following output is returned:
errnotify:
en_pid = 0
en_name = ""
en_persistenceflg = 0
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = ""
en_symptom = ""
en_err64 = ""
en_dup = ""
en_method = "/usr/sbin/errnotify.ksh $1 $2 $3 $4 $5 $6 $7 $8 $9"
- To save the PERMANENT HARDWARE error notification object in the ODM
so it will exist through successive reboots of the system, run the
following command as "
root ":
savebase
- To delete the PERMANENT HARDWARE Error Notification object from the Error
Notification object class, enter:
odmdelete -q"en_class='H' and en_type='PERM' and en_method='/usr/sbin/errnotify.ksh \$1 \$2 \$3 \$4 \$5 \$6 \$7 \$8 \$9'" errnotify
The "odmdelete " command locates the Error Notification
object within the ODM that has an en_class value of "H ", an
en_type of "PERM ", and an en_method of
"/usr/sbin/errnotify.ksh $1 $2 $3 $4 $5 $6 $7 $8 $9 ", and
removes it from the Error Notification Object Class in the ODM.
NOTE: One problem with this error notification
technique is even though the "savebase " command may have
been run to save the Error Notification object in the ODM, sometimes
this ODM change is lost. To ensure the Error Notification method
exists, each time an AIX system is rebooted the ODM should be checked
for this method, and if it doesn't exist, add it. The script to check
the ODM to determine if the PERMANENT HARDWARE Error Notification Method
exists, must exist on each AIX system and must have the file name, permissions
owner, and group as follows:
chmod 555 /usr/sbin/ckerrnotify.ksh
chown bin /usr/sbin/ckerrnotify.ksh
chgrp bin /usr/sbin/ckerrnotify.ksh
Script Source Code for "ckerrnotify.ksh"
This document contains the source code for the
Disaster Recovery script "ckerrnotify.ksh".
This file last modified 02/04/09
To ensure the PERMANENT HARDWARE Error Notification Method is in the
ODM, the "ckerrnotify.ksh " script must be executed every
time the machine reboots. The following entry in the
"/etc/inittab " will execute the
"/etc/rc.local " script when the system is rebooted:
local:2:once:/etc/rc.local > /dev/console 2>&1
To add this entry to the "/etc/inittab " run the
following command:
mkitab "local:2:once:/etc/rc.local > /dev/console 2>&1"
chmod 555 /etc/rc.local
chown bin /etc/rc.local
chgrp bin /etc/rc.local
The "ckerrnotify.ksh " script should be executed from
within the "/etc/rc.local " script. A snippet of code from
the "rc.local " script follows. This code checks to see if
the "/usr/sbin/errnotify.ksh " script exists and is
executable, and runs the "/usr/sbin/ckerrnotify.ksh " script
to check the ODM:
...
...
...
Script Source Code for "odmrclocal.ksh"
This document contains the source code for the
Disaster Recovery script "odmrclocal.ksh".
This file last modified 02/04/09
...
...
...
The PERMANENT HARDWARE Error Notification Method specifies a script
to run in the event an error of this type is logged to the AIX error
log. This error notification script must exist on each AIX system and
must have the file name, permissions, owner and group as follows:
chmod 555 /usr/sbin/errnotify.ksh
chown bin /usr/sbin/errnotify.ksh
chgrp bin /usr/sbin/errnotify.ksh
This script performs the function of sending error notifications to
designated targets such as Tivoli, e-mail, Openview, etc.
Script Source Code for "errnotify.ksh"
This document contains the source code for the
Disaster Recovery script "errnotify.ksh".
This file last modified 02/04/09
The following 3 records are added to each error message processed by
the error notification script.
Machine Class: RS/6000
Machine Type: $( lsattr -El sys0 -a modelname | awk '{ print $2}' )
Operating System: AIX $( oslevel )
Two other files are required by the PERMANENT HARDWARE Error
Notification script. These files provide the communication mechanism to
send messages from the script to Tivoli TEC and must exist on each AIX
system in order for the error notification to work. These files may
exist in a variety of directories, depending upon the version of
software installed. The Error Notification script assumes it will run
these files with the following file names:
/usr/sbin/lcf_env.sh
/usr/sbin/wpostemsg
To provide consistency, a symbolic link is created to point to each
of the required files. But before creating the link, the actual
location of each file must be determined:
find / -name lcf_env.sh -print
find / -name wpostemsg -print
The results of the "find " commands are used with the
symbolic link command to create a link to each of the required
files:
ln -s <Full path to lcf_env.sh> /usr/sbin/lcf_env.sh
ln -s <Full path to wpostemsg> /usr/sbin/lcf_env.sh
|