Performance graph problems in Nagios
The performance graphs in Nagios may do not display data while your checks are giving performance data.
Nagios forms performance graphs that are automatically updated with the execution of a single check when you enable the feature 'performance data'.
That gives performance data and it collects the results in RRD databases.
Data sources are at solid states in the RRD Databases. But, after updates of Nagios checks the number or the names of data sources of a check result may change.It causes the problem that the performance graph is not updating anymore.
How to fix performance graph problems in Nagios?
Here, you have a systematic analysis approach to troubleshooting the performance graph problems in Nagios.
You have to do the following steps to fix the issue.
1) Ensure that Performance Data is enabled
Firstly you have to make sure that the Performance Data is enabled.
For this, navigate to Admin > System Information > Monitoring Engine Status
Make sure that the Performance Data process is green.
2) Calculate The Number Of Spooled Files
Nagios spools performance data into small files. It stops the processing of that files and thereby that files begin to spool up.
The following commands will count the number of files:
# ls /usr/local/nagios/var/spool/perfdata/ | wc -l
# ls /usr/local/nagios/var/spool/xidpe/ | wc -l
If you get a number greater than 20000, it is more likely for the processes to get caught in a loop. Then you have a need to delete them.
To remove this huge number of files in a directory, execute this command:
# find /usr/local/nagios/var/spool/perfdata/ -type f -delete
After deleting the files, wait almost thirty minutes to know either the performance graphs start to work or not.
3) Increase Performance Data Logging Verbosity
You need to increase the Performance Data Logging Verbosity if deletion of spooled files doesn't help you.
Edit the following file from an SSH session and change the LOG_LEVEL value fro 0 to 2
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Now the process_perfdata.pl script should log all errors and debug information to the file /usr/local/nagios/var/perfdata.log.
You can watch it by using the following command:
# tail -f /usr/local/nagios/var/perfdata.log
Watch for any errors, wrong exit codes, and/or timeouts.
After the completion, remember that to return this value to its default settings.
A common error found in this log is the typical timeout error. To solve it temporarily, you can increase the performance data processor’s timeout range by changing the TIMEOUT field in the process_perfdata.cfg file.
4) Increase NPCD Logging Verbosity
NPCD is a mass processing tool that collects and processes the performance data.
Edit the following file in an SSH session and adjust the log_level field from 0 to -1, to increase its logging verbosity.
/usr/local/nagios/etc/pnp/npcd.cfg
Then, restart the NPCD service using the restart command.
After the completion of troubleshooting, remember that to return this value to its default settings
NPCD should now log all errors and debug data to the file /usr/local/nagios/var/npcd.log file. You can watch this using the following command:
# tail -f /usr/local/nagios/var/npcd.log
You may find a common error in the log file which indicates that you are hitting a load threshold.
You can increase this threshold by editing the following file and adjusting the load_threshold value to a higher one:
/usr/local/nagios/etc/pnp/npcd.cfg
5) Check Nagios User Account
In some conditions, the Nagios user account can expire creating issues like this to happen.
You can run the following command to see if the Nagios user account expired or not:
# chage -l nagios
You can enable the expired Nagios user account with the below command:
# chage -I -1 -m 0 -M 99999 -E -1 nagios