Fix for hp-health on DL100 series running CentOS6

Fix for hp-health on DL100 series running CentOS6

I got HP DL160 G6 server running CentOS 6.2. I installed hp-health on the server which has been a common practice to get some good HP related insights.

The installation was successful, however when I tried to start the hp-health service the following Segmentation fault error occurred.

/etc/init.d/hp-health: line 666: 11390 Segmentation fault (core dumped) $PNAME $PARGS < /dev/null >> $LOGFILE 2>&1

Snapshots:

[[email protected] ~]# /etc/init.d/hp-health start
  Using Proliant Standard
     IPMI based 1XX System Health Monitor
  Using standard Linux IPMI device driver
Starting ipmi drivers:                                     [  OK  ]
  Starting Proliant Standard
     IPMI based 1XX System Health Monitor (hpasmpld):
/etc/init.d/hp-health: line 666: 11390 Segmentation fault      (core dumped) $PNAME $PARGS < /dev/null >> $LOGFILE 2>&1

The following is what I could find in the /var/log/messages:

Aug 16 19:45:52 gagan hpasmpld[11298]: ehpsmb_parse_SMBIOS: SMBIOSInitTable was not successful.
Aug 16 19:45:52 gagan kernel: hpasmpld:11298 map pfn expected mapping type uncached-minus for 9e000-a0000, got write-back
Aug 16 19:45:52 gagan kernel: hpasmpld[11298]: segfault at 0 ip 0000000000414918 sp 00007fff09822e18 error 4 in hpasmpld[400000+2a000]
Aug 16 19:45:52 gagan abrt[11299]: saved core dump of pid 11298 (/opt/hp/hp-health/bin/hpasmpld) to /var/spool/abrt/ccpp-2012-08-16-19:45:52-11298.new/coredump (516096 bytes)
Aug 16 19:45:52 gagan abrtd: Directory 'ccpp-2012-08-16-19:45:52-11298' creation detected
Aug 16 19:45:52 gagan abrtd: Package 'hp-health' isn't signed with proper key
Aug 16 19:45:52 gagan abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-08-16-19:45:52-11298 (res:2), deleting

Tried to work with a lot of combinations and configuration changes, but could not get the hp-health service started.

I finally came across the following as a temporary fix for hp-health on DL100 series running CentOS6. This is more of a temporary workaround for this problem.

Fix for hp-health on DL100 series running CentOS6

The problem is that the hp-health is trying to read a memory block which is already being read by another process. This process in question is mcelogd.

A fix for this problem is to stop mcelogd.

[[email protected] ~]# /etc/init.d/mcelogd stop

Output:

[[email protected] ~]# /etc/init.d/mcelogd stop
Stopping mcelog
[[email protected] ~]#                                  [  OK  ]

And then start the hp-health service.

[[email protected] ~]# /etc/init.d/hp-health start

Output:

[[email protected] ~]# /etc/init.d/hp-health start
  Using Proliant Standard
 	IPMI based 1XX System Health Monitor
  Using standard Linux IPMI device driver
Starting ipmi drivers:                                     [  OK  ]
  Starting Proliant Standard
 	IPMI based 1XX System Health Monitor (hpasmpld): 
                                                           [  OK  ]
[[email protected] ~]# /etc/init.d/hp-health status
  Using Proliant Standard
 	IPMI based 1XX System Health Monitor
  Using standard Linux IPMI device driver
  
ipmi_msghandler module loaded.
ipmi_si module loaded.
ipmi_devintf module loaded.
/dev/ipmi0 exists.
  
  (hpasmpld) is running...                                 [  OK  ]

Now start mcelogd

[[email protected] ~]# /etc/init.d/mcelogd start

Output:

[[email protected] ~]# /etc/init.d/mcelogd start
Starting mcelog daemon
[[email protected] ~]# /etc/init.d/mcelogd status
Checking for mcelog
mcelog (pid  23478) is running...

A more permanent fix to ensure that this problem is fixed during next reboot is to update the startup order for these two services:

Modify the file /etc/init.d/hp-health

[[email protected] ~]# vim /etc/init.d/hp-health

Change the following line in the file:

# chkconfig: 2345 91 2

TO

# chkconfig: 2345 31 2

Remove and Add hp-health service from the chkconfig.

[[email protected] ~]# /sbin/chkconfig –del hp-health
[[email protected] ~]# /sbin/chkconfig –add hp-health

Ensure that the startup priority for hp-health is higher (lower in the SXX number) in comparison to mcelogd.

[[email protected] ~]# ls -lah /etc/rc*.d/ | grep “hp-health|mcelogd”

No Comments

Post a Comment

Time limit is exhausted. Please reload CAPTCHA.