CloudInfrastructure

Revision 25 as of 2010-10-29 20:46:03

Clear message

This page is here to collect together proceedings from sessions as part of the 'Cloud Infrastructure' track at the Natty UDS in Orlando, Florida.

Please add proceedings by doing the following:

  • Add a new section with the name of the session.
  • Add the outcomes of the session as a collection of bullet points. The goal of the proceedings is to really focus on decided outcomes, so please try to keep them crisp.

Thanks!

Proceedings

Bootstrap puppet from deployment service (for UEC and more)

cloud-server-n-install-bootstrap-puppet

  • write cron job to pool installation service DB for expected agent requests.
  • upstream to support csr extensions (ie puppet token) for puppet agent.
  • extend puppet ca to validate csr extensions (ie puppet token).
  • extend installation service to include puppet token in installation procedure.

Eucalyptus next steps

natty commitments:

  • ebs root
  • disable api termination
  • instance initiated shutdown behaviory
  • import keypair
  • some user management similar to Identity Access Management
    • improved command line tools for create delete ...
  • tagging (not yet commiting to full aws tagging support)
  • snapshot sharing (user sharing snapshots, not instances)
  • default instance types
    • knowledge of m1.2xlarge ...
    • ephemeral devices for ephemeral1..4
  • CreateImage

  • loader support - up to eucalyptus from ubuntu Ubuntu / Eucalyptus TODOs for natty cycle
  • review of legacy operating systems and virtio disk/network support
  • review of ubuntu patches
  • review of java dependencies
  • euca2ools 2.0 / boto updates (without this most of features above are not exposed)
  • Eucalyptus / Ubuntu to collaborate / share test suitesthank
  • somehow need to get to gwt 2.1 (or 2.0)

How can Ubuntu use awstrial

Improvements to the Ubuntu Cloud Images

cloud-server-n-cloud-images

  • Scott will document how to seed cloud-init from kernel command line
  • Scott and John will get ubuntu cluster compute types to EC2 during Natty Cycle
  • image root fs size for natty resized to 10G

Installation service for physical nodes deployments (UEC and more)

cloud-server-n-install-service

  • Overall architecture outlined.
  • 4 potential projects identified:
    • - fai - cobbler - openqrm - lp:uec-provisioning.
  • More discussion required to gather additional requirements.

On-going maintainance with puppet (for UEC and more)

cloud-server-n-config-mgmt-with-puppet

  • deploy puppet master along installation service.
  • package mcollective to get scalable distributed command infrastructure (eg trigger puppet runs).
  • write puppet modules to configure UEC components (CLC, CC, NC).
    • available in /etc/puppet/modules/.
  • link puppet master to installation service via external_nodes to get classes assigned to each system.
  • tie installation service to puppet master to gather list of available classes.

Openstack packaging

  • Meta packaging to ensure toolkit approach
    • nova-*-mysql, -sqlite, and (pre?)-depend on sane defaults.
  • Packaging branch to be moved to a new team, and Ubuntu ultimately owns the packages.
  • Commits under-go merge proposals for peer review.
  • Upstartify
  • Apport hook
  • Monitoring scripts (collectd / munin)
  • Release cycle alignment
    • Natty freezes February 24; releases April 28.
    • Branch Nova code by Ubuntu feature freeze (Feb 24), backport fixes until FinalFreeze (Apr 14)

  • Current state of Nova and Swift packaging: it works. It's in pretty good shape. NASA is using it.

Components of OpenStack NOVA:

  • Compute
  • API
  • Volume
  • objectstore (simple implementation of S3; it may go away and be replaced by Swift)
  • instancemonitor (monitors loads put on server by virtual machines and graphs them, storing them in the objectstore)
  • scheduler
  • network
  • python-nova
  • nova-common
  • There is no packaging dependency on rabbitmq-server, but there is a runtime dependency. For a DB, sqlite, postgress or mysql will work. Rick considers which DB is packaged and configured to be a Ubuntu decision.
  • SWIFT has a different set of components.
  • Dave: Current default -- sqlite; could we consider mysql?
  • Soren: Wants to make sure install is no-questions; sqlite was "dead simple"
  • Packaging targets single-node use case
  • More complex deployments should be handled through the install service

Issues to handle off line:

  • How to package the databases, authentication packages
  • precede-ability

Rebundling and other cloud utilities

UEC EC2 compatibility

  • Testing framework to ensure EC2 Compatibility
    • - Possibly use txAWS.

Monitoring probes and alerting service (for UEC and more)

cloud-server-n-monitoring-alerting

Actions:

  • move collectd to main.
  • should munin go to universe (probably not yet)
  • find a graphing solution (munin, graphite, reckonater (omniti - not packaged, visage).

Web scale enhancements

cloud-server-n-webscale-tech

  • Further discussion of rampant embedding of libraries in new technology is necessary to determine whether to try and change upstreams practices or change our policies.
  • Low hanging fruit for packaging - php5-gearman, zeromq
  • Amazon EC2 tools that are missing will be reviewed and added in a separate blueprint
  • Cassandra PPA from Maverick cycle will continue to be maintained -- Launchpad stats for PPA's are coming "RSN"
  • CLB project will be continued and clb-* utilities to parallel Amazon Elastic Load Balancer tools will be created (Continuation of server-maverick-uds-cloud-loadbalancing)

Hadoop packaging

CDH 3 will be used as the fundations for Ubuntu. Cloudera packages will be reviewed and tested.

  • review other CDH packages for improvments in the user experience and Ubuntu integration:
    1. hbase
    2. pig
    3. hive
    4. hue
    5. oozie
    6. sqoop
  • review zookeeper patches and look which patches should be integrated in the Debian/Ubuntu packages.
  • file bugs, write patches and have them integrate in CDH3.
    • publish hadoop packages into a PPA and point Cloudera to it for integration.
  • integration with installation service (whenever that is ready).

Ubuntu desktop cloud images

In order to get fully supportable "Ubuntu" images into main, we'll need the following actions.

  • [action] get freenx serve into the main archive (currently in a ppa)
  • [action] get an open nx client into archive (actually, qtnx is already in archive)
  • [action] create desktop images with freenx server from main

In order to have a very slick user demo, we have to:

  • [action] have image offer open nx client via web connection
  • [action] canonical services perhaps contact NoMachine about an optimal OSS nx client

  • [action] must have a windows client, perhaps mac client too (proprietary one might be acceptable here?)
  • [action] we would need the unity-qt implementation since we don't have 3d acceleration in the cloud

Handle virtual networking in the cloud

  • OpenVSwitch seems an appropriate solution to solve virtual networking issue. OpenStack is planning on using it. It is being packaged in Debian, we will have to adapt the kernel module to use DKMS until it reaches mainstream.

  • Other existing tools like tinc and n2n (already in universe) could be blogged about as they solve the problem of secure meshed communications

Openstack gap analysis

http://wiki.openstack.org/Nova/EucalyptusFeatureComparison

Two areas not already on upstream roadmap:

  • SOAP API -> nobody cares

  • Web UI -> UEC cares, Rackspace may care, session tomorrow at 11

Chuck's daily picks:

  • .eucarc / novarc indirection -> nampspace suggestion?

  • pvgrub "floppy" support for ubuntu-style kernel upgrades

Distributed logging

Use rsyslog as the fundation for building distributed logging.

  • Support relp in main (MIR librelp)
  • write puppet recipes to automatically configure rsyslog
  • Integrate in UEC:
    • configure central rsyslog on the ClC
    • configure aggegrator rsyslog on the CC
    • configure central logging via rsyslog on the NC, SC, Walrus
    • use syslog for all UEC components
    • write a script (grep++) to automatically track messages related to an InstanceId

  • look at package Reconoiter in ubuntu to use it for reporting and presentation
  • look at package log analyzer

Application checkpoint/restart

Application checkpoint/restart in linux (linux-cr.org) provides the ability to checkpoint, restart, and migrate application and system containers. This provides a very lightweight mechanism for load-balancing in the cloud.

Actions:

  • Create a ppa with the kernel and userspace packages needed to experiment with c/r
    • Create a project in lp
    • When ppa is up, Gustavo will blog about how to use it
  • Create bindings for libvirt to use lxc.sf.net
    • Then libvirt can handle
      • auto-start of containers on boot
      • creation of a bridge for containers

UEC Web interface

Make LXC ready for production

Conclusions:

  • Some kernel patches (setns, ipvs, ns-cgroup-removal) are heading upstream
    • kernel team may backport those into natty
    • ns cgroup is being deprecated - should be turned off
      • MUST be associated with taking the clone-children control file patch to replace ns cgroup functionality
  • For more forward-looking and experimental lxc patches,
    • Create a kernel based on natty hosted on kernel.ubuntu.com
    • Create a ppa with both custom kernel and lxc package to exploit it
    • Examples of functionality:
      • user namespace
      • containerized syslog
      • tinyproc (see below)
  • Investigate solutions for /proc and /sys containerization
    • One attractive solution was to separate proc from container-safe tinyproc
      • could be a mount option
        • a CAP_HOST_PROC capability is required for mounting full proc
        • tinyproc does not provide /proc/sysrq-trigger, for instance
  • Networking:
    • We should let libvirt handle creation of bridge
    • Someone should investigate getting netcf working in debian+ubuntu
      • To play nice with networkmanager
  • Container auto-start on boot
    • Let libvirt handle it
  • Meeting schedule for Friday to investigate a libvirt binding for liblxc
    • Summary from that meeting:
      • Action for natty to make a debootstrapped image work on host and in container
      • Action for Soren to look at libvirt-lxc console bug
        • (Serge to file a bug)
      • Action to create a new libvirt-container driver, based on openvz driver, which execs lxc.sf.net programs.
        • Ping libvirt community for reaction
        • Updating the existing driver to match lxc.sf.net functionality is too much duplicated work.
      • Long term, we would like to have the container driver call out to lxc.sf.net library - much more work
  • Upstart script for lxc
    • We should see if we can let libvirt handle it all
  • Action: find someone willing to work on a script on top of lxc for easing container creation
  • Action: find someone to push top/ps/netstat/etc containerization patches upstream
  • Action: pursue solutions to container reboot and poweroff

Containerize ptrace/kill

The security team has an interest in smarter ptrace controls, however these do not mesh with this work. They want to mostly prevent ptrace, but allow ptrace_traceme (ab)use by/for debuggers, tracers, and fault handlers. Containers will prevent tasks inside the container from allowing ptrace by a task outside the container. User namespaces would likely be too coarse-grained, globbing together an entire KDE or wine session, allowing all tasks in one such session to ptrace each other.

However, the containerization of kill and ptrace are deemed 'a good thing.' Kees recommends pushing the patchset.

UEC QA for Natty

Containers in UEC

Use cases:

  • support old hardware
  • support testing with ensemble
  • support on arm
  • high density

ACTION: (chuck) Code and test patches for Eucalyptus. This involves both adding LXC as a target and adding the parametric data that would allow the scheduler to select targets that can run the image.

ACTION: For UEC code and test the patches.

ACTION: Investigate increasing the UEC VM-per-processor setting to some default value > 1

ACTION: (serge) Work with platform team to make a stock ubuntu image work in containers

  • Daniel suggests lxc can pass a 'boot' argument to init/upstart
  • Modify /etc/init/*.conf to
    • not run udev
    • emit the needed events to keep boot proceeding in a container
  • See the hacks in /usr/lib/lxc/templates/lxc-ubuntu

ACTION: investigate whether any UEC changes are needed to terminate a libvirt-lxc instance (by kill -9'ing the init)

KVM/Libvirt hypervisor work

actions:

Non-kvm qemu for arm-on-x86 testing was brought up, but it was felt not worth doing. Upstream (Peter Smile :) needs to do work to merge Nokia's qemu-omap tree into upstream, at which point we can package it. QEMU on arm is not interesting. KVM on arm will become interesting in a few years. Xen was brought up. We have DomU support, and kernel team only wants to support Dom0 if it gets pushed upstream.

Cloud-init / cloud-config improvements

Automated server testing

Topics for discussion:

* Automated Server ISO Testing (http://launchpad.net/ubuntu-server-iso-testing)
  * Achievements to-date
    * Automation of ISO testing in Maverick.
       (12 test cases/2 architectures)
    * Control through ec2 hosted Hudson.
       (Automated triggering based on daily CD images)
       (Automated collation of test results)
  * Current challenges:
    * Test execution overhead and concurrency.
    * Lack of dedicated hardware.
  * Current state of project
    * Not packaged for Ubuntu so not that accessible
    * Codebase needs tidy + full documentation
  * Next Steps:
    * General tidyup of project and codebase
    * Packaging for Ubuntu to ease adoption
    * Release 1.0, comms to potential adopters to provide physical resources.

* Automated EC2 AMI Testing using existing framework + Hudson

* Automated package testing using puppet, kvm etc... for UEC, Hadoop and any other moderately complex stack.

        Use Case:
                * New kerberos infrastructure - how do we test.
                        * Setup multiple instances
                        * puppet + required virtual machines/physical machines
                        * google summer of code - puppet to manage libvirt.
                        
                        1) Hudson creates instances - config into CouchDB?
                        2) Puppet configures instances
                        
                        3) Setup checkpoint
                        
                        4) Execute tests
                        5) Hudson Teardown
                        
                        * Potential to re-use for UEC burn testing.
                        
                        Test Data:
                                puppet recipes.
                                test data.
                                actual tests.
                                encapsulate current QA regression tests.
                                
                        * Current regression does not alway leave working infrastructure
                        * Triggered on presense of new version of package in archive.

* Continuous performance non-regression testing

* Requirements to production-ise
        * Hudson into the data centre -> appropriate Hudson packaging required for IS support
        * Single server required for deployment of ISO testing.

Actions:

  • [jamespage] Move tests to checkbox and integrate with guest.
  • [jamespage/smoser] Automate EC2 testing and increase depth of image testing using unittest/subunit
  • [jamespage] - move to normal PXE + TFTP instead for broader fit with potential test architectures.
  • [mark] - Output plugin for checkbox to write to couchdb
  • [mark] - Checkbox plugin to download tests from couchdb to execute
  • [jamespage] - review what the iso overlay looks like and refactor as required.
  • [jamespage] - fix concurrency in ISO download.
  • [hggdh/jamespage] - way forward on production deployment of ISO testing.
  • [jamespage/mathiaz] proof of concept for complex package testing - use case openldap/mysql?

Deferred:

  • gPXE into the archive (update to etherboot) - maybe some licensing issues.