Revision 12 as of 2010-01-22 19:52:09

Clear message


Ubuntu Cloud images are generic "fresh install" disk images that are started up on various clouds and configured at boot time. Previous releases only allowed customization via a user provided script. That script would run at S99 (similar to traditional rc.local).

Instead of requiring the user to write a script to customize the image, we would like to provide a configuration file syntax to support common customization, such as installing additional packages, installing all updates ...

Release Note

Ubuntu Cloud Images can now be configured via a human readable config file syntax. Previously customization could only be done via provided scripts.


Previous Ubuntu releases only supported customization via a script file running late in the boot process. This had many problems:

  • writing shell scripts is complex, easy to screw up
  • only 16k of user data can be passed in a ec2 user-data script
  • script based customization means that the script-provider must maintain their scripts themselves.

By providing a Config syntax with a limited number of options, we can make sure that a given config file behaves the same on different versions of the OS. For some users the config syntax may provide all the customization they need.

User stories

  • As a system administrator, I want to deploy an AMI based on the latest Ubuntu Server Edition in EC2. I run the example commands provided, and my instance is started with the latest Ubuntu updates automatically applied. I can confidently allow public access to the instance, knowing that it is always up to date, and I do not accrue any data transfer charges on account of the update process.
  • As a casual user of EC2, I would like to easily make small changes to my instance in its first boot, without having to know details of how the boot process works. I'd like to install some extra packages from the archive and some from my own ppa.



On first boot, the boot hooks implementation (server-lucid-ec2-boothooks,ServerLucidCloudBoothooks) dumps the user data to a local file and then emits a cloud-config event with the CFGFILE environment variables pointing to the config.yaml file. Upstart jobs will then read the configuration file to configure the running instance.

The following is a list of configuration items that we would like to support:

  • update the instance (apt-get upgrade)
  • add additional packages (apt-get install)
  • adding repositories (with shortcuts for a ppa) and installation of packages from these repositories.
  • mount EBS volumes at specified location
  • configure ephemeral storage usage
    • RAID 0 or LVM to give improved speed and size
    • define mount locations (/dev/sda2 on /var/log, /dev/sdb on /mnt ...)
  • 'runurl' support

Upstart jobs

Jobs depend on at least cloud-config event:

 start on cloud-config

The following upstart jobs are available:

  • apt_conf: configures apt
  • pkg_install: installs packages after apt_conf

Sample YAML configuration file

# Update apt database on first boot
# (ie run apt-get update)
# Default: true
apt_update: false

# Upgrade the instance on first boot
# (ie run apt-get upgrade)
# Default: false
apt_upgrade: true

# Add apt repositories
# Default: none

 # PPA shortcut:
 #  * Setup correct apt sources.list line
 #  * Import the signing key from LP
 #  See for more information
 - source: "ppa:user/ppa"    # Quote the string

 # Custom apt repository:
 #  * Creates a file in /etc/apt/sources.list.d/ for the sources list entry
 #  * [optional] Import the apt signing key from the keyserver 
 #  * Defaults:
 #    + keyserver:
 #    + filename: 00-boot-sources.list
 #    See sources.list man page for more information about the format
 - source: "deb lucid main restricted" # Quote the string
   keyid: 12345678 # GPG key ID published on a key server

 # Custom apt repository:
 #  * The apt signing key can also be specified 
 #    by providing a pgp public key block
 #  The apt repository will be added to the default sources.list file:
 #  /etc/apt/sources.list.d/00-boot-sources.list
 - source: "deb ./" # Quote the string
   key: | # The value needs to start with -----BEGIN PGP PUBLIC KEY BLOCK-----
      Version: SKS 1.0.10

      -----END PGP PUBLIC KEY BLOCK-----

# Add apt configuration files
#  Add an apt.conf.d/ file with the relevant content
#  See apt.conf man page for more information.
#  Defaults:
#   + filename: 00-boot-conf

 # Creates an apt proxy configuration in /etc/apt/apt.conf.d/01-proxy
 - filename: "01-proxy"
   content: |
    Acquire::http::Proxy "";

 # Add the following line to /etc/apt/apt.conf.d/00-boot-conf
 #  (run debconf at a critical priority)
 - content: |
    DPkg::Pre-Install-Pkgs:: "/usr/sbin/dpkg-preconfigure --apt -p critical|| true";

# Provide debconf answers
# See debconf-set-selections man page.
# Default: none
debconf_selections: |     # Need to perserve newlines
        # Force debconf priority to critical.
        debconf debconf/priority select critical

        # Override default frontend to readline, but allow user to select.
        debconf debconf/frontend select readline
        debconf debconf/frontend seen false
# Install additional packages on first boot
# Default: none
 - openssh-server
 - postfix

# Send pre-generated ssh private keys to the server
# If these are present, they will be written to /etc/ssh and
# new random keys will not be generated
  rsa_private: |
    -----END RSA PRIVATE KEY-----

  rsa_public: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAGEAoPRhIfLvedSDKw7XdewmZ3h8eIXJD7TRHtVW7aJX1ByifYtlL/HVzJ09nilCl+MSFrpbFnqjxyL8Rr/DSf7QcY/BrGUQbZn2Kc22PemAWthxHO18QJvWPocKJtlsDNi3 smoser@localhost

  dsa_private: |
    -----END DSA PRIVATE KEY-----

  dsa_public: ssh-dss AAAAB3NzaC1kc3MAAACBAM/Ycu7ulMTEvz1RLIzTbrhELJZf8Iwua6TFfQl1ubb1rHwUElOkus7xMhdVjms8AmbV1Meem7ImE69T0bszy09QAG3NImHgZVIeXBoJ/JzByku/1NcOBYilKP7oSIcLJpGUHX8IGn1GJoH7XRBwVub6Vqm4RP78C7q9IOn0hG2VAAAAFQCDEfCrnL1GGzhCPsr/uS1vbt8/wQAAAIEAjSrok/4m8mbBkVp4IwxXFdRuqJKSj8/WWxos00Ednn/ww5QibysHYULrOKJ1+54mmpMyp5CZICUQELCfCt5ScZ9GsqgmnI80Q1h3Xkwbo3kn7PzWwRwcV6muvJn4PcZ71WM+rdN/c2EorAINDTbjRo97NueM94WbiYdtjHFxn0YAAACAXmLIFSQgiAPu459rCKxT46tHJtM0QfnNiEnQLbFluefZ/yiI4DI38UzTCOXLhUA7ybmZha+D/csj15Y9/BNFuO7unzVhikCQV9DTeXX46pG4s1o23JKC/QaYWNMZ7kTRv+wWow9MhGiVdML4ZN4XnifuO5krqAybngIy66PMEoQ= smoser@localhost


This config syntax parser will be implemented as a plugin to the generic boot hooks implementation. It will be installed as part of the image.

UI Changes

  • Newly developed programs under the server-lucid-xc2 (ServerLuciXc2) spec may expose supported config options as command line options when creating an instance.

Code Changes

A plugin to the boot hooks implementation will be implemented. It will be passed data that is determined to be intended for the plugin.

This new plugin will need to implement the parsing of the configuration, and acting on the supported configuration types. Most likely:

  • the plugin will be implemented in python
  • the config file format will be read by python ConfigParser


These changes will be done in a backwards compatible manner. Users that were creating scripts for customizing their images and passing those scripts to ec2 in user-data will see no changes.

Documentation will be provided on the wiki to point users to the more simplistic and user friendly configuration file syntax.

Test/Demo Plan


Unresolved issues


BoF agenda and discussion

UDS discussion notes

The current method by which users can configure UEC/EC2 images is via 'user-data'. user-data that begins with #! is executed by the appropriate interpreter at S99 runlevel. This is a very effect method for customization but one that requires a fair amount of expertise to utilize.

It would be nice to provide a config file like syntax that would allow: - what mirrors to use / what additional repos to add - whether or not updates are installed on first boot - ssh private keys to use (allowing user to dictate them, rather than polling for random keys from ec2-console) - extra packages to install

Note: this differs from server-lucid-ec2-boothooks as this would provide only for specific/static function. [1] would provide a hook into the images that could allow for building this blueprint.

Problems with the status quo

  • writing shell scripts is complex, easy to screw up -- simple things should be simple
  • only 64k of user data can be passed
  • maintenance issues

Use cases

  • package installation (Ubuntu repositories) *simple*
    • including tasks
    • add specific apt repositories
  • package installation (custom repositories) *simple*
  • install latest updates *simple*
  • Pass more than 64k of user data
    • landscape (registration data)
    • puppet (bootstrap (certificate/private key, autoregistration))
    • rightscale (rightscripts)
    • other config mgmt system: cfengine, bcfg, spline, capistrano
  • Advertise specific features [don't know]
  • dynamic DNS
  • EBS mounting and snapshots *simple*
    • udev script mounting of volume based on metadata in the volume
  • ephemeral storage mounting
    • RAID *simple*
  • asynchronous notification that the instance is up and running:
    • email
    • jabber/XMPP
    • simpledb, sqs
    • rabbitmq (message queue)
    • submission to a url *simple-low priority*
  • run custom scripts
    • at what point in the boot process?
      • pass an upstart job
  • pass any type of credentials through user-data (AWS, certificates, keytabs, ssh keys)
    • security issues
    • all data should be kept as safe as possible
      • config option to set perms on said data


  • - as early as possible parses various souces of user data - acts on it

Implementation ideas

  • share syntax, semantics with d-i preseeding
  • use puppet - not a generic solution to the issue, requires infrastructure
  • Simple key/value pairs format, read by plugins
  • Core features: bootstrapping process -- package install, repository add
  • Section / key-value pair format where each section would get read by a specific plugin
  • Credentials transfer: use S3-backend "safe" storage
  • store scripts, data in S3, access via URL
    • runurl
    • runurl
    • runurl

Other Feedback

  • Notes from EricHammond regarding this snippet in the sample config file:

    •    # Update apt database on first boot
         # (ie run apt-get update)
         # Default: true
         apt_update: false
      • I am concerned about the implication that apt-get update might be run automatically on first boot even when the old style user-data script is provided or when users are configuring through automated ssh on startup. I understand and support the desire to make things easy for new users, but would also suggest that it is also important to let advanced users configure their new instances quickly and not get in their way. Here's a sample scenario:
          user-data script changes apt sources to a different repository,
          includes multiverse, and/or adds a PPA.  user-data script then
          runs apt-get update itself and starts installing software.
      • If apt-get update is run automatically before this starts, then the user-data script has to figure out how to wait for the (useless) update to complete, making the code more complex and delaying the startup of the instance. If the apt-get update were to default to false, then the new config file approach can easily switch it to true, not breaking the way that it currently works. The following proposal seemed to meet some level of agreement on the ubuntu-ec2 and ubuntu-cloud mailing lists (i.e., Mathias said it sounded good and nobody else objected):
          ec2init automatically runs apt-get update on first boot, UNLESS:
          1. a user-data script is provided by the user (starting with #!), OR
          2. the advanced user-data configuration format is provided by the user
             AND that configuration specifies that apt-get update should not be