ServerKarmicCloudPowerManagement

Revision 1 as of 2009-06-03 18:27:50

Clear message

Summary

In Jaunty, we proved that at least some Ubuntu servers can successfully, suspend and hibernate, as well as resume from remote triggers such as wake-on-lan

In Karmic, we hope to integrate into existing cloud management frameworks:

  1. tools for pro-active sysadmins, who want to manually migrate workloads and manually suspend/resume servers
  2. automatic algorithms that can perform these operations in an unattended manner when certain configurable thresholds or conditions are met

Release Note

This section should include a paragraph describing the end-user impact of this change. It is meant to be included in the release notes of the first release in which it is implemented. (Not all of these will actually be included in the release notes, at the release manager's discretion; but writing them is a useful exercise.)

It is mandatory.

Rationale

This should cover the _why_: why is this change being proposed, what justifies it, where we see this justified.

User stories

Assumptions

Design

You can have subsections that better describe specific parts of the issue.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

UDS Raw Notes

  • Machines migrate when there is lite load.
  • For Eucalyptus
    • Boot nodes into suspended state.
    • Wake up as needed.
  • Use live migration to compress a cloud into the minimum amount of hardware.
    • Suspend unneeded hardware.
  • pm-suspend and pm-hibernate are now in server seed.
  • Re-establishing network connectivity after resume may be an issue.
  • Manual process of compressing the cloud hardware.
    • SSH to send and wakeonlan to resume.
  • With new improvements to boot speed actually powering off may be useful.
  • Hibernate may be preferable when reloading a cache.
  • Support poweroff, hibernate and suspend depending on the admin preference.
  • Support IPMI as well as wakeonlan.
  • Make framework configurable as to what tools to use.
  • Do a wakeonlan testing day to generate a list of working hardware.
  • Wakeonlan
    • Simple
    • Doesn't scale
  • IPMI
    • Server hardware required.
    • Authentication supported
  • NUT
    • Network UPS Tool
    • Can also wake systems.
  • Use Cases:
    • Manually compress cloud hardware.
    • Process to actively monitor cloud load and adjust hardware according to load.
  • Use libvirt for VM migration.
  • Eucalyptus integration
    • Power Management needs to live in the same space as the service level agreement.
  • Integrate power management into Landscape.
    • Set SLAs in Landscape which then uses the plugins for power management.
  • Use application to monitor power consumption.
  • May have to target specific hardware.
  • Strongly consider adding power management hooks to libvirt.
    • Could automatically wake up a machine when connecting.
  • What is the appropriate amount of idle time before suspending a system.
  • Allow configuration of the amount of hot nodes.
  • Adjust cloud load based on node load.


CategorySpec