From Fedora Project Wiki

Agent-Free Systems Management

Summary

On server class systems, monitoring and managing hardware health/configuration remotely for large number of systems is crucial. One important component of this systems management solution on each server is the Service Processor. System Administrators and Monitoring/Configuration Software (like Nagios, etc.) connect to the Service Processor via shared/dedicated management networks.

The information provided by the Service Processor is mostly independent of the Operating System running on the server. It is possible through systems management software installed on the operating system to obtain a richer set of systems management functionality overall. Such systems management software that run on Linux are specific to the vendor of the server, and can also be proprietary. They can also be bulky and require to be validated/managed like any other application.

We can envision an ideal systems management solution comprising of the Service Processor and the operating system combination that “just work” without the need for a vendor specific (and sometimes proprietary) software without a major loss of feature.

The goal of this feature is the substitute some of the important functionality of the systems management software that is usually installed on the operating system by a native implementation. This will also put existing standards already in use by Service Processors like IPMI and WSMAN to better use.

Owner

Current status

  • Targeted release: Fedora 18
  • Last updated: 2012-07-23
  • Percentage of completion: 30 %

Detailed Description

  • Publish OS information to Service Processor
    • Purpose: OS information should be accessible via the Service Processor remotely.
    • On systems that contain a service processor, upon each boot-up, publish these to the Service Processor:
      • “OS Name”, Example: “Fedora”
      • “OS Version”, Example: “17"
      • “System Host Name”, Example: “fedora.example.com”
    • Achieved by a mix of standard IPMI and delloem commands
  • Heartbeat to Service Processor
    • Purpose: Capture screen shot in Service Processor for debugging on system crash during runtime and install-time
    • On each startup or during installation, set-up the IPMI watchdog via freeipmi-bmc-watchdog systemd service
  • Retrieve log from Service Processor
    • Purpose: Have syslog log Service Processor events so there is one log where system administrators can look for OS and Service Processor events.
    • OS daemon should fetch logs from Service Processor, preferably with filtering capability.
    • In addition to what ipmievd can provide currently
  • Support for redirection of SNMP
    • Purpose: Utilize SNMP agent on Service Processor and provide access to Service Processor MIB via the OS's SNMP agent.
    • Redirect selected snmp queries to the service processor's IP
    • Redirect traps form Service Processor to Fedora's trap destination.
    • Detect Service Processor interface IP and configure SNMP redirection and trap configuration
    • Example on Dell PowerEdge Servers: Redirect OIDs under .1.3.6.1.4.1.674.10892.2 to the service processor.
  • Include IPMI support in anaconda
    • Purpose: Set/Retrieve systems management information during install-time
    • Use-case: Communicate various stages of the anaconda installation to the Service Processor to aid in debugging installation issues.
    • Use-case: Capture screen-shot from Service Processor in case of install-time system crash/hang.
  • Publish Service Processor URL and IP via WSMAN
    • Purpose: One-to-Many management consoles are able to launch the service processor management console by retrieving the URL from an OS based agent. Moving this functionality into the OS enables the same feature without the need to install an additional application into the OS.
    • Retrieve IP address and URL of service processor and expose them via standard DMTF name-space
    • Needs to be dynamic: Any changes to service processor IP address/URL should reflect on the host OS or when queried by wsman
    • Can use existing CIM name space: https://sblim.sf.net/wbem/wscim/1/cim-schema/2

Benefit to Fedora

  • The Fedora users of servers that contain Service Processors do not have to install additional software dedicated to systems management and still expect standard pieces of information to be available remotely.
  • Assists with debugging system failures (panic, hang, etc.) remotely.

Scope

The new features will require additions/modifications to:

  • Fedora’s start-up scripts and default configuration files of certain services
  • Support for IPMI in Anaconda

How To Test

  • Install Fedora on test machine with service processor
  • Publish OS information to Service Processor
    • Service Processor should provide the OS Version and Name via various supported interfaces
  • Heartbeat to Service Processor
    • With the watchdog daemon configured, a kernel panic or system crash should result in the system rebooting after the set time and a snapshot of the crash and/or an entry in the SEL log should be recorded.
  • Retrieve log from Service Processor
    • syslog should contain IPMI SEL events logged by ipmievd
  • Support for redirection of SNMP
    • After configuration of /etc/snmpd.conf, snmp queries to Fedora with the Service Processor OID should succeed and return correct values that would otherwise be retrieved via the Service Processor's snmp agent.
  • Include IPMI support in anaconda
    • During install time, we should have access to ipmitool or freeipmi commands that can be used via kickstart's pre-install section.
  • Publish Service Processor URL and IP via WSMAN
    • Get Service Processor URL and IP by querying the OS's wsman server.

User Experience

  • Easier management of larger networks using industry standard technologies.
  • Minimal intervention from user in configuring Service Processor access details in the OS.

Dependencies

  • systemd
  • freeipmi
  • freeipmi-bmc-watchdog
  • ipmitool
  • anaconda
  • dracut

Contingency Plan

  • Document with script-lets on how similar functionality can be achieved.

Documentation

Release Notes

  • Agent-free systems management of Fedora system remotely via Service Processor.

Comments and Discussion