LDMLoadbalancingSupport

Differences between revisions 2 and 3
Revision 2 as of 2007-04-30 21:26:57
Size: 2703
Editor: 141
Comment:
Revision 3 as of 2007-04-30 21:32:20
Size: 3352
Editor: 141
Comment:
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:

=== Implementation Plan ===
Line 37: Line 34:
A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module. A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module.   It's very hard to pick the "best" server. Right now there are two ways.
1. Throw away the non responding servers, pick a random one from the remaining servers (random tends to make a good choice a high percentage of the time).
2. The other option provides the administrator some tunable parameters like to ignore servers that are swapping, ignore servers that take x ms to respond, ignore servers that have a load higher then x (normalized to number of processors). Random is uses on the remaining servers. If no candidates available, it pick the random server from the list of servers that responded.

=== Implementation Plan ===
Line 42: Line 45:
* Even thought the default right now is to scan the server list every 5 seconds. That behavior will become optional. The default behavior will be to check the servers * Even thought the default right now is to scan the server list every 5 seconds. That behavior will become optional. The default behavior will be changed to only query the servers once when the user "logs in".

Summary

Right now with LTSP in ubuntu there is no good solution for using more then one server for thin client deployments.

Rationale

Use Cases

A real life scenario is our thin client deployment at the school of Engineering at Oakland University. Initially, we had Sun Sunrays backed by the Linux port of the SunRay services. Since then we've moved away from SunRays to real X11 terminals based on LTSP support in Ubuntu. Some of terminals are dedicated X11 terminals that were purchased at the beginning of this calendar year other terminals are desktop machines that can be booted into "thin client mode" via a PXE boot menu.

The one feature that was lost in the transition from SunRays to LTSP is the support for multiple servers. Right now thin clients all run of of one server, out of the two available. This summer time the school of Engineering is getting three more servers. Here at the university a lot of the applications that our users run are engineering applications (math, physics simulations & CAD) which tax the systems, which is why we need multiple servers and the ability to grow vertically.

Scope

Design

Summary

Rationale

Scope and Use Cases

Implementation

The first iteration of the implementation is available at: http://ltsp.mindtouchsoftware.com/ltsp-loadbalance.

It consists of a stand alone server component (ltsp-server-advertise) that is able to provide clients with statistics to resources available. This is implemented in python as a daemon. The application waits for incoming inquisitions on port (currently 377). It returns a xml document with the statistics and closes the socket.

A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module.

It's very hard to pick the "best" server. Right now there are two ways. 1. Throw away the non responding servers, pick a random one from the remaining servers (random tends to make a good choice a high percentage of the time). 2. The other option provides the administrator some tunable parameters like to ignore servers that are swapping, ignore servers that take x ms to respond, ignore servers that have a load higher then x (normalized to number of processors). Random is uses on the remaining servers. If no candidates available, it pick the random server from the list of servers that responded.

Implementation Plan

Outstanding Issues

* Document the xml schema. * Clean up the code of the server component. * Even thought the default right now is to scan the server list every 5 seconds. That behavior will become optional. The default behavior will be changed to only query the servers once when the user "logs in".

BoF agenda and discussion


CategorySpec

LDMLoadbalancingSupport (last edited 2008-08-06 16:14:46 by localhost)