This SOP is to describe nagios configurations
Contact Information
Owner: Fedora Infrastructure Team
Contact: #fedora-admin, sysadmin-main & sysadmin-noc groups
Location: Anywhere
Servers: noc01, noc02, noc01.stg, puppet1
Purpose: This SOP is to describe nagios configurations
Initial Configuration
CGI Access
To view information in nagios (anything with cgi-bin in the path) you need to be able to grant yourself access. After checking out the Puppet CVS tree as described in the Puppet SOP you first need to edit configs/system/nagios/cgi.cfg and append your FAS username to 'authorized_for_system_commands'
Contact Information
Create a new file named 'fasname.cfg' in configs/system/nagios/contacts/ with the following details:
define contact{ contact_name fasname alias Real Name service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email Email address (any) }
Next append your name to the 'members' section of configs/system/nagios/contactgroups/fedora-sysadmin-email.cfg
nagios-external
The same changes will need to be applied with the nagios-external configuration (configs/system/nogios-external)
Commit Changes
Commit changes by running cvs commit -m "Adding fasname to Nagios"
and then mark the changes for distribution by make install
Configuration
Instances
Fedora Project runs two nagios instances, nagios (noc01) and nagios-external (noc02), you must be in the 'sysadmin' group to access them.
Staging Istances
Apart from the two production istances, we are currently running a staging istance for testing-purposes available through SSH at noc01.stg.
nagios (noc01)
The nagios configuration on noc01 should only monitor general host statistics - puppet status, uptime, apache status (up/down), SSH etc.
The configurations are found at configs/system/nagios/
in the puppet tree.
nagios-external (noc02)
The nagios configuration on noc02 is located outside of our main datacenter and should monitor our user websites/applications (fedoraproject.org, FAS, PackageDB, Bodhi/Updates).
The configurations are found at configs/system/nagios-external/
in the puppet tree.
Production and staging istances through SSH
Note: Please make sure you are into 'sysadmin' and 'sysadmin-noc' FAS groups before trying to access these hosts.
ssh UID@bastion.fedoraproject.org
) (Note: no password login is required, so if you get any password request at this point, you probably don't have access to this machine)
ssh UID@noc01
for the production system or ssh UID@noc01.stg
for the staging one) (Note: you will be prompted for a password, which is the one registered for your FAS account)
NRPE
We are currently using NRPE to execute remote Nagios plugins on any host of our network.
A great guide about it and its usage mixed up with some nice images about its structure can be found at this link
Understanding the Messages
General
Nagios notifications are generally easy to read, and follow this consistent format:
** PROBLEM/ACKNOWLEDGEMENT/RECOVERY alert - hostname/Check is WARNING/CRITICAL/OK ** ** HOST DOWN/UP alert - hostname **
Reading the message will provide extra information on what is wrong.
Disk Space Warning/Critical
Disk space warnings normally include the following information:
DISK WARNING/CRITICAL/OK - free space: mountpoint freespace(MB) (freespace(%) inode=freeinodes(%)):
A message stating "(1% inode=99%)" means that the diskspace is critical not the inode usage and is a sign that more diskspace is required.