NetbootManagement

Differences between revisions 1 and 7 (spanning 6 versions)
Revision 1 as of 2005-10-15 12:01:16
Size: 2158
Editor: p5494F2A3
Comment:
Revision 7 as of 2005-11-02 20:33:17
Size: 10088
Editor: 209
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
##(see the SpecSpec for an explanation)

 * Created: 15. Okt by ReinhardTartler
 * Priority: NeedsPriority
 * People: NeedsLead, NeedsSecond
 * Contributors: siretart
 * Interested:
 * Status: UbzSpecification, BrainDump (then DraftSpecification then EditedSpecification then ApprovedSpecification), DistroSpecification
 * Branch:
 * Malone bug:
 * Packages affected:
 * Depends:
 * Dependents:
 [[FullSearch()]]
 * BoF sessions: none yet
 * '''Launchpad Entry''': https://launchpad.net/distros/ubuntu/+spec/cluster-installation
 * '''Created''': [[Date(2005-10-27T21:14:37Z)]] by JaneWeideman
 * '''Contributors''': JaneWeideman, ReinhardTartler, IvanKrstic
 * '''Packages affected''':
 * Depends: ConfigurationInfrastructure, AuthenticationInfrastructure | NetworkAuthentication
Line 19: Line 9:
Enabling users to easily setup a computer pool (cluster) with ubuntu installations. We want to enable users to easily perform mass installations of Ubuntu on a pool of machines by developing a console tool to intelligently manage dhcpd configuration. We provide a GUI frontend for Edubuntu or similar environments.
Line 23: Line 13:
We already have fully automated installation with d-i using preseeding (and/or kickstart). We plan to have a CentralAuthentication facility. Let's put all parts together so they work out-of-the-box We already support fully automated installation with d-i using preseeding (and/or kickstart).
Let's put all parts together so they work out-of-the-box, and provide a reasonable interface to them.
Line 27: Line 18:
The obvious ones:

 * Internet cafe clusters, where anonymous users can surf
 * Educational clusters at universities and/or schools

The less obvious ones:

 * Offices, when all workstations needs to be able to install in a timely manner, but are still customizable enough for the local admin.
 * Andreas is running an Internet cafe on several thick clients. He doesn't want to use the LTSP functionality in Breezy, but wants to install Ubuntu on each machine individually, and be able to trivially reinstall any machine.
 
 * Kathrin runs HPC computing clusters at her university. She needs to be able to install Ubuntu on 300 compute nodes, which are contributed by five departments. Each department's nodes form a logical group. Kathrin wants to have each node register itself with the resource manager upon finishing installation. She also wants nodes to stage certain files from the central server locally. The whole pool, or any logical group, should be trivial to reinstall by performing the appropriate action in our management tool (in the console or the GUI frontend), and then simply rebooting the nodes.
 
 * Reinhard is a teacher at Kathrin's university. He knows that his department's logical group in Kathrin's cluster (composed of sixty machines) is currently idle, as there are no jobs submitted. So he wants to use the machines as LTSP thin clients with an LTSP server running on Ubuntu on the same network. Kathrin should be able to select the machines in Reinhard's logical group, set them to boot into LTSP on the next boot, and then reboot them all.
 
Line 38: Line 26:
To allow the admin to use both ltsp setups as well as cluster setups, we need some interface to dhcp config. We define therefore netboot modes:
 * computer netboots in ltsp mode. This is what we already have
 * computer netboots in installer mode. This time, the computer gets fully automatically installed using d-i preseeding and/or kickstart
 * The default netboot mode is user/admin definable.
Our management tool needs to be able to set each node, or a group of nodes, to boot as LTSP thin clients, or as regular thick clients from the local disk, or to boot into the (preseeded/kickstarted) Ubuntu unattended installer. The tool should also support initiating the reboot remotely, but this functionality will likely be postponed to another Spec or a later version.
Line 43: Line 28:
The installed system needs to be integrated into the local CentralAuthentication infrastructure. Furthermore we need facilities for keeping the installations uptodate and install extra software packages. The management tool is an intelligent interface to the dhcpd and pxelinux configuration files.

A netboot policy is a simple specification of the file that gets sent to a client that is requesting a PXE boot. We provide two built-in policies that cannot be deleted:
 * Boot from local disk
 * Netboot the Ubuntu installer (kickstart/preseed)

Because we are operating under the assumption that changing the BIOS setting for PXE boot is not an option (it's always set to 'on' -- which would normally be the case in HPC clusters utilizing the technology in this spec), we have to provide a method to actually boot a client from the local disk, even though it's attempting a PXE boot from the server.

The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC address to root device for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the 'unknown' nmt group, the installation server automatically removes the machine from the unknown group, and promotes its netboot policy to 'boot from local disk'.

The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve via PXE a syslinux image to machines set to boot from local disk, which simply chains to the bootloader on the root device specified in the mapping.

== Management tool design ==

The tool is called 'nmt' (netboot management tool). It's a CLI tool; a GUI frontend is available for Edubuntu and similar settings, where we don't want the administrator to have to drop into the shell to configure things.

Initially, the management tool requires specification of two default policies: one for 'known' machines, and one for 'unknown' machines. An unknown machine is one that has never been seen on the network before. After its first netboot (which will happen according to the 'default unknown' policy), the machine is automatically registered with nmt (provided that nmt is running at the time!), and assigned the 'default known' boot policy.

An admin would normally start nmt before netbooting any unknown machines that she wants registered automatically. A 'parse' option also exists for out-of-band operation (see below).

The tool supports the following options:
 * Assign a name to a machine
 * Assign a machine to a group
 * Set the netboot policy on next boot for a machine or a group
 * Parse a dhcpd log file to extract and register all unknown machines (this just means that previously unseen machines that were netbooted, falling under the 'unknown' group policy, now show up in the list of machines in 'nmt' -- still in the 'unknown' group).
 
By default, all machines (even those unregistered with 'nmt') belong to the 'unknown' group, which cannot be deleted.



For example, Andreas, the internet cafe owner from our first use case, will install his main Ubuntu server, and loads the GUI tool. He will set the default netboot policy for unknown machines to 'netboot the installer image' and provide a kickstart file he created with the kickstart configuration tool (already in breezy). When he saves the configuration, he can boot the rest of the computers in his internet cafe. They will fall under the default netboot policy for unknown machines, and will start the installer image, installing a standard internet cafe desktop fully unattended.

So the GUI tool:
 * allows the administrator to select which image is booted the next time each machine netboots.
 * allows to define default netboot settings for both known and unknown machines.

The Admin has to provide either a kickstart or a d-i preseed file.

After that automated installation, a post-install script running at the end of stage1 hooks up with the ConfigurationInfrastructure. This is then responsible for integration with distributing config file and handling software updates.

The main interface would be a 4 coloumn table like the following example:

| IP | MAC | Name | Boot policy | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

The last coloumn gives a set of these: predefined actions for performing next boot:
 * run the installer
 * boot locally
 * boot LTSP (if configured/available on this system)

In addition to that, the admin can provide names of other boot targets. In this case, the experienced admin has to provide sufficient configuration in the dhcpconfig hisself.
Line 49: Line 85:
 * Implement managment facilities using cfengine.
 * Implement a GUI Tool using these interfaces, so that a local admin can register machines and defines the netboot method. Takes kickstart or preseed files form the admin and enables fully (or semi) automated installs.
Line 53: Line 90:
=== Data preservation and migration === '''DHCP Configuration/pxe loader configuration tool:'''

 * List machines, mac addresses, friendly name and next boot action
 * List default boot action
 * Define friendly name for a given mac address
 * Setup static IP address for a given mac address
 * list a logical group
 * Add/Remove a mac address from a logical group
 * update the next boot action for a mac address or a logical group

'''GUI:'''
 
 provide a GTK front end to the above tool that a classroom teacher can
 use easily.

The main interface would be a 4 column table like the following example:

| IP | MAC | Name | Action on next boot | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

changing IP address, Name or the Action drop-down are all supported.
The table should always start with the default action on the top line.

'''Client side post install script to update server:'''

'''Server-side daemon to listen for client update messages and dynamically update
pxe boot status to allow local boot after install'''
 
=== Data preservation and migration ===
Line 58: Line 124:
 * CentralAuthentication  * ConfigurationInfrastructure
Line 62: Line 128:
The main interface would be a 4 coloumn table like the following example:

| IP | MAC | Name | Action on next boot | Actions |
| <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) |
| 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |


== Notes ==

Need to send to main server: just where to locate the next bootloader to chain to
(keep in mind having to potentially translate grub syntax for locating the bootloader)
- modify grub/lilo installers to put the final install device somewhere we can read it from
- remind cjwatson
Line 63: Line 143:
----

== Comments ==
 - at the moment, Edubuntu ships a default dhcpd.conf that allows LTSP clients to boot without any special configuration. ideally, this dhcpd.conf would be replaced with a simple invocation to 'nmt' specifying the default policy for unknown machines is 'boot LTSP'.

Summary

We want to enable users to easily perform mass installations of Ubuntu on a pool of machines by developing a console tool to intelligently manage dhcpd configuration. We provide a GUI frontend for Edubuntu or similar environments.

Rationale

We already support fully automated installation with d-i using preseeding (and/or kickstart). Let's put all parts together so they work out-of-the-box, and provide a reasonable interface to them.

Use cases

  • Andreas is running an Internet cafe on several thick clients. He doesn't want to use the LTSP functionality in Breezy, but wants to install Ubuntu on each machine individually, and be able to trivially reinstall any machine.
  • Kathrin runs HPC computing clusters at her university. She needs to be able to install Ubuntu on 300 compute nodes, which are contributed by five departments. Each department's nodes form a logical group. Kathrin wants to have each node register itself with the resource manager upon finishing installation. She also wants nodes to stage certain files from the central server locally. The whole pool, or any logical group, should be trivial to reinstall by performing the appropriate action in our management tool (in the console or the GUI frontend), and then simply rebooting the nodes.
  • Reinhard is a teacher at Kathrin's university. He knows that his department's logical group in Kathrin's cluster (composed of sixty machines) is currently idle, as there are no jobs submitted. So he wants to use the machines as LTSP thin clients with an LTSP server running on Ubuntu on the same network. Kathrin should be able to select the machines in Reinhard's logical group, set them to boot into LTSP on the next boot, and then reboot them all.

Design

Our management tool needs to be able to set each node, or a group of nodes, to boot as LTSP thin clients, or as regular thick clients from the local disk, or to boot into the (preseeded/kickstarted) Ubuntu unattended installer. The tool should also support initiating the reboot remotely, but this functionality will likely be postponed to another Spec or a later version.

The management tool is an intelligent interface to the dhcpd and pxelinux configuration files.

A netboot policy is a simple specification of the file that gets sent to a client that is requesting a PXE boot. We provide two built-in policies that cannot be deleted:

  • Boot from local disk
  • Netboot the Ubuntu installer (kickstart/preseed)

Because we are operating under the assumption that changing the BIOS setting for PXE boot is not an option (it's always set to 'on' -- which would normally be the case in HPC clusters utilizing the technology in this spec), we have to provide a method to actually boot a client from the local disk, even though it's attempting a PXE boot from the server.

The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC address to root device for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the 'unknown' nmt group, the installation server automatically removes the machine from the unknown group, and promotes its netboot policy to 'boot from local disk'.

The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve via PXE a syslinux image to machines set to boot from local disk, which simply chains to the bootloader on the root device specified in the mapping.

Management tool design

The tool is called 'nmt' (netboot management tool). It's a CLI tool; a GUI frontend is available for Edubuntu and similar settings, where we don't want the administrator to have to drop into the shell to configure things.

Initially, the management tool requires specification of two default policies: one for 'known' machines, and one for 'unknown' machines. An unknown machine is one that has never been seen on the network before. After its first netboot (which will happen according to the 'default unknown' policy), the machine is automatically registered with nmt (provided that nmt is running at the time!), and assigned the 'default known' boot policy.

An admin would normally start nmt before netbooting any unknown machines that she wants registered automatically. A 'parse' option also exists for out-of-band operation (see below).

The tool supports the following options:

  • Assign a name to a machine
  • Assign a machine to a group
  • Set the netboot policy on next boot for a machine or a group
  • Parse a dhcpd log file to extract and register all unknown machines (this just means that previously unseen machines that were netbooted, falling under the 'unknown' group policy, now show up in the list of machines in 'nmt' -- still in the 'unknown' group).

By default, all machines (even those unregistered with 'nmt') belong to the 'unknown' group, which cannot be deleted.

For example, Andreas, the internet cafe owner from our first use case, will install his main Ubuntu server, and loads the GUI tool. He will set the default netboot policy for unknown machines to 'netboot the installer image' and provide a kickstart file he created with the kickstart configuration tool (already in breezy). When he saves the configuration, he can boot the rest of the computers in his internet cafe. They will fall under the default netboot policy for unknown machines, and will start the installer image, installing a standard internet cafe desktop fully unattended.

So the GUI tool:

  • allows the administrator to select which image is booted the next time each machine netboots.
  • allows to define default netboot settings for both known and unknown machines.

The Admin has to provide either a kickstart or a d-i preseed file.

After that automated installation, a post-install script running at the end of stage1 hooks up with the ConfigurationInfrastructure. This is then responsible for integration with distributing config file and handling software updates.

The main interface would be a 4 coloumn table like the following example:

| IP | MAC | Name | Boot policy | Actions | | <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) | | 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

The last coloumn gives a set of these: predefined actions for performing next boot:

  • run the installer
  • boot locally
  • boot LTSP (if configured/available on this system)

In addition to that, the admin can provide names of other boot targets. In this case, the experienced admin has to provide sufficient configuration in the dhcpconfig hisself.

Implementation

  • Define interface to dhcpd config to define netboot actions
  • Define interface to create/update/edit preseed configs
  • Implement a GUI Tool using these interfaces, so that a local admin can register machines and defines the netboot method. Takes kickstart or preseed files form the admin and enables fully (or semi) automated installs.

Code

DHCP Configuration/pxe loader configuration tool:

  • List machines, mac addresses, friendly name and next boot action
  • List default boot action
  • Define friendly name for a given mac address
  • Setup static IP address for a given mac address
  • list a logical group
  • Add/Remove a mac address from a logical group
  • update the next boot action for a mac address or a logical group

GUI:

  • provide a GTK front end to the above tool that a classroom teacher can use easily.

The main interface would be a 4 column table like the following example:

| IP | MAC | Name | Action on next boot | Actions | | <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) | | 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

changing IP address, Name or the Action drop-down are all supported. The table should always start with the default action on the top line.

Client side post install script to update server:

Server-side daemon to listen for client update messages and dynamically update pxe boot status to allow local boot after install

=== Data preservation and migration ===

Outstanding issues

Needs integration with:

The main interface would be a 4 coloumn table like the following example:

| IP | MAC | Name | Action on next boot | Actions | | <dynamic> | 00:01:02... | unassigned | <boot locally> | (reboot now!) | | 10.2.3.4 | 00:02:03:... | chinstrap | <reinstall next boot> | (reboot now!) |

Notes

Need to send to main server: just where to locate the next bootloader to chain to (keep in mind having to potentially translate grub syntax for locating the bootloader) - modify grub/lilo installers to put the final install device somewhere we can read it from - remind cjwatson

BoF agenda and discussion


Comments

  • - at the moment, Edubuntu ships a default dhcpd.conf that allows LTSP clients to boot without any special configuration. ideally, this dhcpd.conf would be replaced with a simple invocation to 'nmt' specifying the default policy for unknown machines is 'boot LTSP'.

NetbootManagement (last edited 2008-08-06 16:24:21 by localhost)