NetbootManagement

Differences between revisions 7 and 8
Revision 7 as of 2005-11-02 20:33:17
Size: 10088
Editor: 209
Comment:
Revision 8 as of 2005-11-03 17:10:05
Size: 10567
Editor: 209
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
Should be renamed to MassInstallationInfrastructure
Line 9: Line 11:
We want to  enable users to easily perform mass installations of Ubuntu on a pool of machines by developing a console tool to intelligently manage dhcpd configuration. We provide a GUI frontend for Edubuntu or similar environments. We want to enable users to easily perform mass installations of Ubuntu on a pool of machines by developing a console tool to intelligently manage dhcpd and syslinux configuration. We provide a GUI frontend for Edubuntu and similar environments. A tiny daemon doing housekeeping is also required, but contains very simple functionality and doesn't require root privileges.

N.B. We use the term 'cluster' to mean 'a pool of machines'. While this could be a pool of cluster nodes, it doesn't have to be. A more accurate name for the spec would be UbuntuMassInstallation.
Line 14: Line 18:
Let's put all parts together so they work out-of-the-box, and provide a reasonable interface to them. Let's put all parts together so they are easy to use even for less experienced administrators, and provide a reasonable interface to them.
Line 20: Line 24:
 * Kathrin runs HPC computing clusters at her university. She needs to be able to install Ubuntu on 300 compute nodes, which are contributed by five departments. Each department's nodes form a logical group. Kathrin wants to have each node register itself with the resource manager upon finishing installation. She also wants nodes to stage certain files from the central server locally. The whole pool, or any logical group, should be trivial to reinstall by performing the appropriate action in our management tool (in the console or the GUI frontend), and then simply rebooting the nodes.  * Kathrin runs a high-performance computing (HPC) cluster at her university. She needs to be able to install Ubuntu on 300 compute nodes, which are contributed by five departments. Each department's nodes form a named group. Kathrin wants to have each node register itself with the resource manager upon finishing installation. She also wants nodes to stage certain files from the central server locally. The whole pool, or any named group, should be trivial to reinstall by performing the appropriate action in our management tool (in the console or the GUI frontend), and then simply rebooting the nodes.
Line 22: Line 26:
 * Reinhard is a teacher at Kathrin's university. He knows that his department's logical group in Kathrin's cluster (composed of sixty machines) is currently idle, as there are no jobs submitted. So he wants to use the machines as LTSP thin clients with an LTSP server running on Ubuntu on the same network. Kathrin should be able to select the machines in Reinhard's logical group, set them to boot into LTSP on the next boot, and then reboot them all.  * Reinhard is a teacher at Kathrin's university. He knows that his department's named group in Kathrin's cluster (composed of sixty machines) is currently idle, as there are no jobs submitted. So he wants to use the machines as LTSP thin clients with an LTSP server running on Ubuntu on the same network. Kathrin should be able to select Reinhard's named group, set it to boot into LTSP on the next boot, and then reboot all the machines in the group.
Line 26: Line 30:
Our management tool needs to be able to set each node, or a group of nodes, to boot as LTSP thin clients, or as regular thick clients from the local disk, or to boot into the (preseeded/kickstarted) Ubuntu unattended installer. The tool should also support initiating the reboot remotely, but this functionality will likely be postponed to another Spec or a later version. The management tool we discuss is called 'nmt' (netboot management tool). It is able to set the next boot policy for each controlled node, or a named group of nodes. Initially, all nodes belong to the 'unknown' group.
Line 28: Line 32:
The management tool is an intelligent interface to the dhcpd and pxelinux configuration files. A boot policy is a simple specification of the file that gets sent to a client that is requesting a PXE boot. Example policies include:
Line 30: Line 34:
A netboot policy is a simple specification of the file that gets sent to a client that is requesting a PXE boot. We provide two built-in policies that cannot be deleted:
 * Boot from local disk
 * Netboot the Ubuntu installer (kickstart/preseed)
 * boot as a regular thick client from the local disk
 * boot into the (preseeded/kickstarted) Ubuntu unattended installer
 * restore a system image (re-image the machine)
 * save a system image (snapshot the machine)
 * boot as LTSP thin client,
Line 34: Line 40:
Because we are operating under the assumption that changing the BIOS setting for PXE boot is not an option (it's always set to 'on' -- which would normally be the case in HPC clusters utilizing the technology in this spec), we have to provide a method to actually boot a client from the local disk, even though it's attempting a PXE boot from the server. nmt has the first four policies built-in. The LTSP policy would be shipped by the Ubuntu LTSP package. The tool should also support initiating the reboot remotely, but this functionality will likely be handled by the ConfigurationInfrastructure spec.
Line 36: Line 42:
The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC address to root device for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the 'unknown' nmt group, the installation server automatically removes the machine from the unknown group, and promotes its netboot policy to 'boot from local disk'. Built-in groups are:
 * '''unknown''': A machine which has not been seen at all before, is automatically placed in the unknown group. As soon as nmtd detects the machine in the dhcpd log files, it assigns the machine to this group.
 * '''local boot''': This policy makes the machine boot from local disk. This policy is immutable, and only available with machines that have been previously installed by the unattended Ubuntu installer, also controlled by nmt.
Line 38: Line 46:
The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve via PXE a syslinux image to machines set to boot from local disk, which simply chains to the bootloader on the root device specified in the mapping. All computers supposed to be controlled by nmt would be set to PXE boot. We operate under the assumption that once turned on, it would be impractical to turn PXE booting off on a per-machine basis (as is certainly the case with, for example, computing clusters). This is why we have to provide a method to boot a client from the local disk, even though it's attempting a PXE boot from the server.
Line 40: Line 48:
== Management tool design == The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC addresses to root devices for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the '''unknown''' nmt group, the installation server automatically removes the machine from the unknown group, and places it into the built-in '''local boot''' group.
Line 42: Line 50:
The tool is called 'nmt' (netboot management tool). It's a CLI tool; a GUI frontend is available for Edubuntu and similar settings, where we don't want the administrator to have to drop into the shell to configure things. The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve via PXE a syslinux image which simply chains to the bootloader on the root device specified in the mapping.
Line 44: Line 52:
Initially, the management tool requires specification of two default policies: one for 'known' machines, and one for 'unknown' machines. An unknown machine is one that has never been seen on the network before. After its first netboot (which will happen according to the 'default unknown' policy), the machine is automatically registered with nmt (provided that nmt is running at the time!), and assigned the 'default known' boot policy. The notification at the end of the first stage of the installer is received on the installation server by a tiny daemon called nmtd. nmtd runs as non-root, and its sole purposes are to receive stage1 installation notifications and parse the dhcpd logs in real-time to provide nmt with up-to-date information.
Line 46: Line 54:
An admin would normally start nmt before netbooting any unknown machines that she wants registered automatically. A 'parse' option also exists for out-of-band operation (see below). nmt is a CLI tool; a GUI frontend is available for less experienced system administrators, who want to avoid dropping into the shell to configure things.
Line 48: Line 56:
The tool supports the following options: The tool supports the following actions:
Line 52: Line 60:
 * Parse a dhcpd log file to extract and register all unknown machines (this just means that previously unseen machines that were netbooted, falling under the 'unknown' group policy, now show up in the list of machines in 'nmt' -- still in the 'unknown' group).  * List all known machines, or all machines belonging to a group

== Addressing the use cases ==

Here we explain how the tools we will build, nmt and nmtd, address each of our use cases.

 * Andreas, the internet cafe owner from our first use case, will install his main Ubuntu server, and load the nmt GUI tool. He will set the default policy for the 'unknown' group to 'boot the unattended Ubuntu installer'. When he applies the nmt configuration, he can boot the rest of the computers in his internet cafe. Because of the 'unknown' group policy, the machines will get Ubuntu automatically installed. After the stage1 of install, the installer will send its root device to nmtd on the server, which will automatically place it into the 'local boot' named group. On subsequent reboots, the auto-installed machines will be provided a PXE boot image that instructs them to boot from local disk.

 * Kathrin installs her main Ubuntu server, customizes the kickstart file for her cluster to perform file staging and registration with the resource manager, and turns on her 300 compute nodes. They are installed fully automatically. Then she creates named groups for the five departments, and reassigns the appropriate machines from the 'local boot' group (where they wre automatically placed at the end of the stage1 installer) to the groups she just created. She sets the default policy for all department groups to 'boot from local disk'. She has a fully operational HPC cluster.

 * Kathrin places Reinhard in sudoers on her installation server, which allows him to fire up nmt and set the default boot policy for his department's group to 'boot LTSP', and back to 'boot from local disk' when he's done. Reinhard is able to easily use his department machines as thin clients when they're not being used as compute nodes.
Line 54: Line 72:
By default, all machines (even those unregistered with 'nmt') belong to the 'unknown' group, which cannot be deleted. == nmt interface design ==
Line 56: Line 74:
The nmt interface is a simple table:
Line 57: Line 76:

For example, Andreas, the internet cafe owner from our first use case, will install his main Ubuntu server, and loads the GUI tool. He will set the default netboot policy for unknown machines to 'netboot the installer image' and provide a kickstart file he created with the kickstart configuration tool (already in breezy). When he saves the configuration, he can boot the rest of the computers in his internet cafe. They will fall under the default netboot policy for unknown machines, and will start the installer image, installing a standard internet cafe desktop fully unattended.

So the GUI tool:
 * allows the administrator to select which image is booted the next time each machine netboots.
 * allows to define default netboot settings for both known and unknown machines.

The Admin has to provide either a kickstart or a d-i preseed file.

After that automated installation, a post-install script running at the end of stage1 hooks up with the ConfigurationInfrastructure. This is then responsible for integration with distributing config file and handling software updates.

The main interface would be a 4 coloumn table like the following example:

| IP | MAC | Name | Boot policy | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

The last coloumn gives a set of these: predefined actions for performing next boot:
 * run the installer
 * boot locally
 * boot LTSP (if configured/available on this system)

In addition to that, the admin can provide names of other boot targets. In this case, the experienced admin has to provide sufficient configuration in the dhcpconfig hisself.
| IP | MAC | Name | Group | Boot policy | Actions |
| <dynamic> | 00:01:02... | unknown | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | unknown | chinstrap | <reinstall next boot> | (reboot now!) |
Line 90: Line 89:
'''DHCP Configuration/pxe loader configuration tool:'''

 * List machines, mac addresses, friendly name and next boot action
 * List default boot action
 * Define friendly name for a given mac address
 * Setup static IP address for a given mac address
 * list a logical group
 * Add/Remove a mac address from a logical group
 * update the next boot action for a mac address or a logical group

'''GUI:'''
 
 provide a GTK front end to the above tool that a classroom teacher can
 use easily.

The main interface would be a 4 column table like the following example:

| IP | MAC | Name | Action on next boot | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

changing IP address, Name or the Action drop-down are all supported.
The table should always start with the default action on the top line.
Both nmt and nmtd will be written in Python. They will use a SQLite database to share state. The autoamtic stage1 installer is modified to send completion notification to the installation server.
Line 127: Line 104:
 * NetworkwideUpdates
 * NetworkAuthentication
 
Line 128: Line 108:
The main interface would be a 4 coloumn table like the following example: == Notes ==
Line 130: Line 110:
| IP | MAC | Name | Action on next boot | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |
- nmt could provide an easy graphical interface to configure fixed ip addresses in the dhcpd.conf

- Eventually, we will need to provide integration with NetworkwideUpdates and NetworkAuthentication, which will both be hooked into the post-installation stage of the automatic installer.
Line 135: Line 115:
== Notes == - admin can design custom kickstart/presed file. we'll generate one at the end of the installationserver install, which duplicates a standard ubuntu install using defaults given at install time.

- the default password of the default preseed file will be random and presented in nmt

ajmitch offers help with implementation. Nicolas Kassis offers a testbed for nmt.

New Policy: Image restoring. Something which is netbooting to restore a preinstalled image. Another
policy could create such an image

Should be renamed to MassInstallationInfrastructure

Summary

We want to enable users to easily perform mass installations of Ubuntu on a pool of machines by developing a console tool to intelligently manage dhcpd and syslinux configuration. We provide a GUI frontend for Edubuntu and similar environments. A tiny daemon doing housekeeping is also required, but contains very simple functionality and doesn't require root privileges.

N.B. We use the term 'cluster' to mean 'a pool of machines'. While this could be a pool of cluster nodes, it doesn't have to be. A more accurate name for the spec would be UbuntuMassInstallation.

Rationale

We already support fully automated installation with d-i using preseeding (and/or kickstart). Let's put all parts together so they are easy to use even for less experienced administrators, and provide a reasonable interface to them.

Use cases

  • Andreas is running an Internet cafe on several thick clients. He doesn't want to use the LTSP functionality in Breezy, but wants to install Ubuntu on each machine individually, and be able to trivially reinstall any machine.
  • Kathrin runs a high-performance computing (HPC) cluster at her university. She needs to be able to install Ubuntu on 300 compute nodes, which are contributed by five departments. Each department's nodes form a named group. Kathrin wants to have each node register itself with the resource manager upon finishing installation. She also wants nodes to stage certain files from the central server locally. The whole pool, or any named group, should be trivial to reinstall by performing the appropriate action in our management tool (in the console or the GUI frontend), and then simply rebooting the nodes.
  • Reinhard is a teacher at Kathrin's university. He knows that his department's named group in Kathrin's cluster (composed of sixty machines) is currently idle, as there are no jobs submitted. So he wants to use the machines as LTSP thin clients with an LTSP server running on Ubuntu on the same network. Kathrin should be able to select Reinhard's named group, set it to boot into LTSP on the next boot, and then reboot all the machines in the group.

Design

The management tool we discuss is called 'nmt' (netboot management tool). It is able to set the next boot policy for each controlled node, or a named group of nodes. Initially, all nodes belong to the 'unknown' group.

A boot policy is a simple specification of the file that gets sent to a client that is requesting a PXE boot. Example policies include:

  • boot as a regular thick client from the local disk
  • boot into the (preseeded/kickstarted) Ubuntu unattended installer
  • restore a system image (re-image the machine)
  • save a system image (snapshot the machine)
  • boot as LTSP thin client,

nmt has the first four policies built-in. The LTSP policy would be shipped by the Ubuntu LTSP package. The tool should also support initiating the reboot remotely, but this functionality will likely be handled by the ConfigurationInfrastructure spec.

Built-in groups are:

  • unknown: A machine which has not been seen at all before, is automatically placed in the unknown group. As soon as nmtd detects the machine in the dhcpd log files, it assigns the machine to this group.

  • local boot: This policy makes the machine boot from local disk. This policy is immutable, and only available with machines that have been previously installed by the unattended Ubuntu installer, also controlled by nmt.

All computers supposed to be controlled by nmt would be set to PXE boot. We operate under the assumption that once turned on, it would be impractical to turn PXE booting off on a per-machine basis (as is certainly the case with, for example, computing clusters). This is why we have to provide a method to boot a client from the local disk, even though it's attempting a PXE boot from the server.

The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC addresses to root devices for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the unknown nmt group, the installation server automatically removes the machine from the unknown group, and places it into the built-in local boot group.

The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve via PXE a syslinux image which simply chains to the bootloader on the root device specified in the mapping.

The notification at the end of the first stage of the installer is received on the installation server by a tiny daemon called nmtd. nmtd runs as non-root, and its sole purposes are to receive stage1 installation notifications and parse the dhcpd logs in real-time to provide nmt with up-to-date information.

nmt is a CLI tool; a GUI frontend is available for less experienced system administrators, who want to avoid dropping into the shell to configure things.

The tool supports the following actions:

  • Assign a name to a machine
  • Assign a machine to a group
  • Set the netboot policy on next boot for a machine or a group
  • List all known machines, or all machines belonging to a group

Addressing the use cases

Here we explain how the tools we will build, nmt and nmtd, address each of our use cases.

  • Andreas, the internet cafe owner from our first use case, will install his main Ubuntu server, and load the nmt GUI tool. He will set the default policy for the 'unknown' group to 'boot the unattended Ubuntu installer'. When he applies the nmt configuration, he can boot the rest of the computers in his internet cafe. Because of the 'unknown' group policy, the machines will get Ubuntu automatically installed. After the stage1 of install, the installer will send its root device to nmtd on the server, which will automatically place it into the 'local boot' named group. On subsequent reboots, the auto-installed machines will be provided a PXE boot image that instructs them to boot from local disk.
  • Kathrin installs her main Ubuntu server, customizes the kickstart file for her cluster to perform file staging and registration with the resource manager, and turns on her 300 compute nodes. They are installed fully automatically. Then she creates named groups for the five departments, and reassigns the appropriate machines from the 'local boot' group (where they wre automatically placed at the end of the stage1 installer) to the groups she just created. She sets the default policy for all department groups to 'boot from local disk'. She has a fully operational HPC cluster.
  • Kathrin places Reinhard in sudoers on her installation server, which allows him to fire up nmt and set the default boot policy for his department's group to 'boot LTSP', and back to 'boot from local disk' when he's done. Reinhard is able to easily use his department machines as thin clients when they're not being used as compute nodes.

nmt interface design

The nmt interface is a simple table:

| IP | MAC | Name | Group | Boot policy | Actions | | <dynamic> | 00:01:02... | unknown | unassigned | <boot locally> | (reboot now!) | | 10.2.3.4 | 00:02:03:... | unknown | chinstrap | <reinstall next boot> | (reboot now!) |

Implementation

  • Define interface to dhcpd config to define netboot actions
  • Define interface to create/update/edit preseed configs
  • Implement a GUI Tool using these interfaces, so that a local admin can register machines and defines the netboot method. Takes kickstart or preseed files form the admin and enables fully (or semi) automated installs.

Code

Both nmt and nmtd will be written in Python. They will use a SQLite database to share state. The autoamtic stage1 installer is modified to send completion notification to the installation server.

Client side post install script to update server:

Server-side daemon to listen for client update messages and dynamically update pxe boot status to allow local boot after install

=== Data preservation and migration ===

Outstanding issues

Needs integration with:

Notes

- nmt could provide an easy graphical interface to configure fixed ip addresses in the dhcpd.conf

- Eventually, we will need to provide integration with NetworkwideUpdates and NetworkAuthentication, which will both be hooked into the post-installation stage of the automatic installer.

- admin can design custom kickstart/presed file. we'll generate one at the end of the installationserver install, which duplicates a standard ubuntu install using defaults given at install time.

- the default password of the default preseed file will be random and presented in nmt

ajmitch offers help with implementation. Nicolas Kassis offers a testbed for nmt.

New Policy: Image restoring. Something which is netbooting to restore a preinstalled image. Another policy could create such an image

Need to send to main server: just where to locate the next bootloader to chain to (keep in mind having to potentially translate grub syntax for locating the bootloader) - modify grub/lilo installers to put the final install device somewhere we can read it from - remind cjwatson

BoF agenda and discussion


Comments

  • - at the moment, Edubuntu ships a default dhcpd.conf that allows LTSP clients to boot without any special configuration. ideally, this dhcpd.conf would be replaced with a simple invocation to 'nmt' specifying the default policy for unknown machines is 'boot LTSP'.

NetbootManagement (last edited 2008-08-06 16:24:21 by localhost)