ServerPreciseCloudPowerManagement

  • Launchpad Entry: servercloud-p-cloud-power-management

  • Created: 2010-05-15

  • Contributors: Arnaud Quette

  • Packages affected: nut, fence-agents, powernap, powerwake, cobler, orchestra, juju, openstack, eucalyptus


Warning /!\ This is a WIP document


Summary

Precise will provide advanced power management features for servers and infrastructures.

Release Note

This section should include a paragraph describing the end-user impact of this change. It is meant to be included in the release notes of the first release in which it is implemented. (Not all of these will **actually be included in the release notes, at the release manager's discretion; but writing them is a useful exercise.)

It is mandatory.

Rationale

With the fast growing adoption of the Cloud, the lack of a consistent and efficient power management has become a major concern.

Cloud infrastructures currently run blindly with regard to power management. Cloud nodes are generally protected by UPS, but often no protection software is running. So when a UPS runs out of battery runtime, Cloud nodes generally crash badly. The same goes for power supplies monitoring, which is currently non-existent.

In the below schema, imagine that PSU#1 is faulty. Now what if UPS #2 reaches low battery, and so cannot provide power anymore to the only valid PSU? The CLC will crash!

ServerOneiricInfraPower-examplePSU.png

Moreover, when deploying new loads (physical servers and virtual machines), neither the status of the UPS (Ie online or on battery) nor its efficiency are considered. This heavily lowers the general power efficiency of these infrastructure.

It's typical to have 50 % only of the incoming power that is actually used by servers.

In other words, half of the power you pay for is wasted into heat, that then needs to be cooled! Since "you can't control or manage what you don't measure", time has come to manage power!

Ubuntu provides various power management software, dedicated to:

  • protection (NUT),
  • measurement (NUT)
  • management (NUT and fence-agents)
  • or efficiency and availability (PowerWake and PowerNap).

But there are still several issues:

  • NUT, PowerWake/PowerNap and fence-agents are separate systems, with no consolidated data or visibility,

  • NUT still lacks configuration tools and web management interfaces, to provide a good user experience,
  • PowerWake uses WoL and cobbler / orchestra use fence-agents (support PDU), to power up cloud compute nodes. NUT could be a more reliable replacement for WoL and fence-agents.

This specification defines the remaining tasks and missing tools required to provide a complete infrastructure power management to Ubuntu systems, easily and by default.

You may also want to read the previous specification, part of the Oneiric cycle, which provides some other information.

User stories

  • Chuck is running an Openstack cloud infrastructure. It's composed of an infrastructure server, that deploys through PXE the diskless compute nodes. He would like to power on demand these servers, to start these only when needed.
  • Tim is running an Eucalyptus Cloud infrastructure. He wants to protect all his systems from power issues. This means that power protection software must be deployed and configured early.
  • Nick is running a Cloud on a big datacenter. He would love to optimize the power availability and usage of his infrastructure, to make it greener and more efficient. Supporting all his UPS and PDU is a minimum. Adding support for the power supplies in his servers would improve his visibility. Having PowerWake / PowerNap play nicely with power management would be perfect.

  • Andres is deploying a new private Cloud. He wants to inventorize first all his infrastructure (including servers, storage, UPS and PDU, ...). He then wants to put all these data in a central infrastructure server, that provision all the servers deployment logic (including operating system, and power protection).

Design

The general idea is to provide Cloud power management through the software deployed to protect and manage infrastructure (NUT, PowerNap and fence agents).

All the above user stories can be sumed up with the following sequence:

  • (Optional) Hardware inventory: detect hardware, to provision software,
  • Infrastructure provisioning (using HW inventory data or NUT discovery) and software deployment. This step may include a full PXE deployment of the OS, or just the power protection software,
  • Cloud scheduler using the power systems data to take smarter decisions for deployment

This sequence will be demonstrated in a separate article, and illustrated below:

ServerPreciseCloudPowerManagement-exampleCloud.png

Hardware inventory

The first thing needed is to be able to inventorize power devices (UPS, PDU and server power supplies) that are present on the network, into inventory systems (OCS Inventory NG and Fusion Inventory).

In the end, all hardware is inventorized, including servers (and components), and power devices, and available for use by provisioning systems such as Cobbler / Orchestra.

Note that, for example, Fusion Inventory provides webservice interfaces (json, xmlprc, rest) to allow data retrieval.

Provisioning Power protection

Ubuntu could be the first to provide an easy to deploy power protection, smart power management, and so full visibility on power.

NUT now provides the ability to support the whole chain of power devices, including:

  • the power supplies of the servers (IPMI),
  • the PDU that distribute power (and allow for remote off/on/cycling operation),
  • the UPS that provide runtime, in case of power failure, and power cleaning.

By having NUT widely deployed in infrastructures, Cloud system could gain more safety. Ie, each node would still be free to shutdown itself when the UPS runs out of battery runtime.

This requires provisioning in Cobler / Orchestra.

Cobbler-power-protection1.png

Data from the inventory systems will be provided through a Web UI. This one will help provisioning power protection configuration, and deployment of NUT.

Cloud power awareness

Cloud scheduler could use these power information to better select servers to be started, or candidates for new VM hosting and computation.

This can take into account:

  • power status (Ie, is the UPS, protecting this node, running online or on battery),
  • power load (Ie, the more you tend toward 100 % load on a UPS, the more efficient the UPS is ; in other word, the more you increase PUE and efficiency)
  • battery runtime (Ie, select the zone where the UPS has the more remaining runtime).

Cloud power management

As a replacement or complement to WoL (Wake on LAN) and fence-agents (simple PDU support), NUT can manage (start, stop, restart) lots of UPS and PDU in an easy and generic way.

This would allow to power up or down any node of the infrastructure, using a single and common mechanism.

NUT also provides a tool to discover all these supported devices, along with NUT data servers on the network.

Cobbler-power-mgt1-1.png

Cobbler-power-mgt2-1.png

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed.

Hardware inventory

NUT provides a tool, called 'nut-scanner', that allows to discover all NUT supported devices:

This tool also permit to discover NUT servers ('upsd'), available on the network. A Java binding ('jNutScanner') is already available. Python and Perl bindings (Perl is required for *Inventory) are underway.

Integration of this component into OCS Inventory NG and Fusion Inventory is underway to bring the needed data. These will then be used by the provisioning server.

nut-scanner is part of the official NUT distribution since 2.6.2. NUT 2.6.3 adds week runtime dependencies. This allows to compile the nut-scanner with all features, and only provide at runtime the ones that have satisfied dependencies (Ie, SNMP scans will be available if Net SNMP libraries are installed).

Implementation is available, but not yet packaged. To test it, retrieve and compile NUT 2.6.3:

$ wget http://www.networkupstools.org/source/2.6/nut-2.6.3.tar.gz
$ tar xzvf nut-2.6.3.tar.gz
$ cd nut-2.6.3
$ ./configure && make
$ cd tools/nut-scanner
$ ./nut-scanner
Scanning USB bus:
[nutdev1]
        driver=usbhid-ups
        port=auto
        vendorid=0x463
        productid=0x0001
        serial=AV2G3300L
Scanning XML/HTTP bus:
Scanning NUT bus (old connect method):
        xcp-usb@localhost
        hid-usb@localhost
        snmp1@localhost
        simu@localhost
        nmc-eaton@localhost
        powernap@localhost

If you provide SNMP information (network range, community name, ...), you may also have SNMP entries (here in parseable format):

$ ./nut-scanner -qSP --mask_cidr 166.99.224.0/24
SNMP:driver="snmp-ups",port="166.99.224.46",desc="Eaton ePDU",mibs="eaton_epdu",community="public"
SNMP:driver="snmp-ups",port="166.99.224.59",desc="Eaton ePDU",mibs="eaton_epdu",community="public"
SNMP:driver="snmp-ups",port="166.99.224.151",desc="Eaton ePDU",mibs="eaton_epdu",community="public"
SNMP:driver="snmp-ups",port="166.99.224.167",desc="Eaton ePDU",mibs="eaton_epdu",community="public"
SNMP:driver="snmp-ups",port="166.99.224.161",desc="Evolution",mibs="mge",community="public"
SNMP:driver="snmp-ups",port="166.99.224.94",desc="IBM",mibs="ietf",community="public"
SNMP:driver="snmp-ups",port="166.99.224.94",desc="IBM 1440VA/1000W Rack HV UPS",mibs="pw",community="public"
SNMP:driver="snmp-ups",port="166.99.224.171",desc="Evolution",mibs="mge",community="public"

Parseable format, such as above, makes it easy to create wrapper for language bindings.

Provisioning Power protection

NOTE: the previous step may be replaced by, or completed with, the NUT devices discovery mechanism.

Once power devices have been discovered, or beside of this step, provisioning will have to consider power software deployment, part of the operating system installation.

Definition of an interface to retrieve inventory data is part of a separate project (refer to Hardware inventory in Cobbler)

Cloud power awareness

FIXME...

Cloud power management

Cobbler / Orchestra integration

A simple way to provide NUT features, part of the existing infrastructure, is to create a 'fence_nut' fence-agent.

This would provide a similar call interface to the fence-agents currently used in Cobbler / Orchestra.

The Cobbler template (templates/power/power_nut.template) would be:

fence_nut -a "$power_address" -n "$power_id" -o "$power_mode"

with power_id:

  • 1-n: outlet ID (outlet.%i.<action>) => load.off,off,cycle

  • 0,null: whole system (outlet.<action> or nothing for load.off,on ; shutdown.stayoff,return, reboot[.graceful]

This requires the implementation of fence_nut.

Foundation

The most reliable way to ensure NUT availabity for fence_nut, and general power information gathering, is to have a local NUT installation.

All networked power devices, with which provisioning / cloud can interact, should have a local NUT driver.

Servers would then be easilly started, by not relying on external software.

The added bonus is that, since it adds a 2nd source for the same devices, we gain communication redundancy. This means that we're less prone to communication issues.

Draft notes

Configuration library and tool

Implementation of a Python module should be done in 'scripts/python/module/PyNUT.py'. The class should be named 'PyNUTClient'.

PowerChain implementation

Implementation details:

  • Create a new nut-psu driver,

  • Create a new NUT data, 'device.parent', defined in ups.conf and exposed by all NUT drivers. This will store the parent reference in NUT canonical form:

    device[:outlet][@hostname[:port]]

    Example:
      ups.conf
        [psu1]
            driver = nut-psu
            port = psu1
            parent = pdu1:outlet1@localhost
        [pdu1]
            driver = snmp-ups
            port = <ip address>
            parent = ups1@localhost
        [ups1]
            driver = usbhid-ups
            port = auto
            parent = main

      $ upsc psu1 device.parent
      pdu1:outlet1@localhost
  • Implement power-chain support in upsc ('-P' option to get a tree list, when using '-l' and '-L')

      $ upsc -P localhost
      psu1 -> pdu1:outlet1 -> ups1

      $ upsc -Pl localhost
      ups1
       |-> pdu1:outlet1
            |-> pdu1
  • Implement power-chain support in upsmon

Improved fence-agents

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Packaging

Packaging will happen in Debian, and will then be synchronized in Ubuntu.

The current short run TODO list is:

  • Create a 'nut-client' package, including upsmon, upsc, upsrw, upscmd, and related files
  • Create documentation packages 'nut-doc', provided by default by 'nut-doc-html' (multi page version) and 'nut-doc-pdf'
  • Distribute Augeas lenses:
    • scripts/augeas/*.aug /usr/share/augeas/lenses/
    • scripts/augeas/tests/*.aug /usr/share/augeas/lenses/test/
    • Distribute by default with 'nut' or
  • Distribute device-recorder.sh (need to be renamed to 'nut-recorder') with nut-dev
  • Create a 'python-nut' (name to be discussed) including PyNUT (scripts/python/module/)
  • Distribute scripts/perl/Nut.pm (perl-nut ?)

With the 2.8.0 NUT release (around september), the following things are also scheduled:

  • Enable SSL support, through Mozilla NSS
  • Distribute new drivers (nut-psu, nut-powernap(?), ...)
  • Distribute nut-scanner and configuration tools and library (separate package?)

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.

References

Power efficiency and PUE


CategorySpec

ServerPreciseCloudPowerManagement (last edited 2012-01-18 13:54:03 by aquette)