LDMLoadbalancingSupport

Revision 8 as of 2007-05-08 14:06:23

Clear message

Summary

Right now with LTSP in ubuntu there is no good solution for using more then one server for thin client deployments.

Rationale

Larger rollouts of LTSP will be possible when it becomes easy to add any number of additional application servers to the network.

Use Cases

Rudolfo has 400 thin clients and four application servers. He would like to balance the load of login sessions across the four application servers to support up to 360 concurrent logins.

Margot teaches computer science at a school in a rural area, and she has 17 recycled thin clients available for her lab, but she does not have a server available to support the 17 TCs. Instead, she has scraped together three desktop class machines for this purpose, and she wants to be able to support all 17 TCs with concurrent logins.

Scope

This spec is concerned with load balancing ldm login sessions across multiple application servers.

Design

Summary

Rationale

Implementation

The first iteration of the implementation is available at: http://ltsp.mindtouchsoftware.com/ltsp-loadbalance.

It consists of a stand alone server component (ltsp-server-advertise) that is able to provide clients with statistics to resources available. This is implemented in python as a daemon. The application waits for incoming inquisitions on port (currently 377). It returns a xml document with the statistics and closes the socket.

A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module.

It's very hard to pick the "best" server. Here are some options:

  1. Implemented: The other option provides the administrator some tunable parameters like to ignore servers that are swapping, ignore servers that take x ms to respond, ignore servers that have a load higher then x (normalized to number of processors). Random is uses on the remaining servers. If no candidates available, it pick the random server from the list of servers that responded.
  2. Implemented: Throw away the non responding servers, pick a random one from the remaining servers (random tends to make a good choice a high percentage of the time).
  3. Unimplemented: Each client uses the same server (dynamically eliminating unresponsive servers from the list of what's available) every session. This can be done by hashing the TC MAC address and taking a modulus into the list of available servers.
  4. Unimplemented: The user is presented with a list of available servers (preferably with some stats about the server load) and chooses the preferred server. One server can be automatically recommended. (This is a variation of the first option, above.)
    • Status info may include: "Down", "Idle", "Busy", "Swapped"
    • The chooser should auto-refresh periodically.
  5. We will modify ltsp-update-sshkeys to make configuration of additional servers easy, it will check if /etc/ltsp/extraservers exists, if so it will connect to the listed servers, retrieve the keys and append them to the ssh_known_hosts file in the chroot.

Unimplemented Pieces

Because the existing code for querying servers is in Python and the next generation greeter is written in C it's impossible to the already written module directly. Also, there's plans for future non Gtk greater (Qt comes to mind). The solution for this is to use a IPC mechanism between ldm and the greeter where ldm can provide information about the servers to the greeter. The mechanism that comes to mind are POSIX message queues.

TODO: protocol structure

Outstanding Issues

  • Document the xml schema.
  • Clean up the code of the server component.
  • Even thought the default right now is to scan the server list every 5 seconds. That behavior will become optional. The default behavior will be changed to only query the servers once when the user "logs in".
  • Greeter integration.
  • ltsp-update-sshkeys needs to import keys from multiple servers.

BoF agenda and discussion


CategorySpec