LDMLoadbalancingSupport

Differences between revisions 2 and 14 (spanning 12 versions)
Revision 2 as of 2007-04-30 21:26:57
Size: 2703
Editor: 141
Comment:
Revision 14 as of 2008-08-06 16:14:46
Size: 5584
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
 * '''Created''': [[Date(2007-04-18T20:08:54Z)]]
 * '''Contributors''': Milosz Tanski
 * '''Packages affected''': ldm
 * '''Launchpad Entry''': [[https://blueprints.edge.launchpad.net/ltsp/+spec/ldm-load-balancing|ldm-load-balancing]]
 * '''Created''': 2007-04-30
 * '''Contributors''': Milosz Tanski, Francis Giraldeau
 * '''Packages affected''': ltsp-server, ltsp-client
Line 6: Line 7:
Right now with LTSP in ubuntu there is no good solution for using more then one server for thin client deployments.
This specification is about extending LTSP to support multiple application servers.
Line 11: Line 11:
To support larger rollouts of LTSP, you could use a bigger machine, or adding many smaller ones. The later is related to the idea of Bewolf clusters, that are made of commodity hardware. The goal is to make it possible to add any number of additional secondary application servers to the network, and dispatch users on them.
Line 13: Line 15:
A real life scenario is our thin client deployment at the school of Engineering at Oakland University. Initially, we had Sun Sunrays backed by the Linux port of the SunRay services. Since then we've moved away from SunRays to real X11 terminals based on LTSP support in Ubuntu. Some of terminals are dedicated X11 terminals that were purchased at the beginning of this calendar year other terminals are desktop machines that can be booted into "thin client mode" via a PXE boot menu. Rudolfo has 1400 thin clients and 30 application servers. He would like to balance users sessions across the available application servers. He want to be able to add new servers while the needs growth.
Line 15: Line 17:
The one feature that was lost in the transition from SunRays to LTSP is the support for multiple servers. Right now thin clients all run of of one server, out of the two available. This summer time the school of Engineering is getting three more servers. Here at the university a lot of the applications that our users run are engineering applications (math, physics simulations & CAD) which tax the systems, which is why we need multiple servers and the ability to grow vertically. Margot teaches computer science at a school in a rural area, and she has 17 recycled thin clients available for her lab, but she does not have a server available to support the 17 TCs. Instead, she has scraped together three desktop class machines for this purpose, and she wants to be able to support all 17 TCs with concurrent logins.
Line 19: Line 21:
This spec is concerned with load balancing login sessions across multiple application servers. It covers :

 * Support many load distribution algorithms
 * This spec is about LDM, but XDMCP should be supported also
 * The integration of ltsp-loadbalancer
 * Make sure that other tools are working in such setup
Line 21: Line 30:
In LDM, the script $CHROOT/usr/lib/ltsp/get_hosts, if exists, will return the server or the server list for login. This script has to be written by the administrator in gutsy.
Line 22: Line 32:
=== Summary === A default implementation should be provided, with those algorithms :
Line 24: Line 34:
 * ltsp-loadbalancer client
 * randomized : take one server randomly
 * best rated : take the best server
 * weightned best rated : randomize, but give more probability to best rated servers
 * fixed : load balancing disabled
Line 25: Line 40:
=== Rationale === For advanced loadbalancing, ltsp-loadbalancer would be used. Smaller setups could use other algorithms to easy the administration. The algorithm selection will be done by setting a variable in lts.conf file.
Line 27: Line 42:
The get_hosts programm should be keept small and efficient, because it runs on the thin-client.
Line 28: Line 44:
=== Scope and Use Cases === LDM can display several servers in the GUI.
Line 30: Line 46:
=== Implementation Plan === For XDMCP setups, only the first server in the list will be used for query.
Line 33: Line 49:

== Outstanding Issues ==

== BoF agenda and discussion ==

== Prototype (deprecated) ==
Line 37: Line 60:
A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module. A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module. 
Line 39: Line 62:
== Outstanding Issues ==
* Document the xml schema.
* Clean up the code of the server component.
* Even thought the default right now is to scan the server list every 5 seconds. That behavior will become optional. The default behavior will be to check the servers
It's very hard to pick the "best" server. Here are some options:
 1. Implemented: The other option provides the administrator some tunable parameters like to ignore servers that are swapping, ignore servers that take x ms to respond, ignore servers that have a load higher then x (normalized to number of processors). Random is uses on the remaining servers. If no candidates available, it pick the random server from the list of servers that responded.
 1. Implemented: Throw away the non responding servers, pick a random one from the remaining servers (random tends to make a good choice a high percentage of the time).
 1. Unimplemented: Each client uses the same server (dynamically eliminating unresponsive servers from the list of what's available) every session. This can be done by hashing the TC MAC address and taking a modulus into the list of available servers.
 1. Unimplemented: The user is presented with a list of available servers (preferably with some stats about the server load) and chooses the preferred server. One server can be automatically recommended. (This is a variation of the first option, above.)
  * Status info may include: "Down", "Idle", "Busy", "Swapped"
  * The chooser should auto-refresh periodically.
Line 44: Line 70:
== BoF agenda and discussion ==  * We will modify ltsp-update-sshkeys to make configuration of additional servers easy, it will check if /etc/ltsp/extraservers exists, if so it will connect to the listed servers, retrieve the keys and append them to the ssh_known_hosts file in the chroot.

Because the existing code for querying servers is in Python and the next generation greeter is written in C it's impossible to the already written module directly. Also, there's plans for future non Gtk greater (Qt comes to mind). The solution for this is to use a IPC mechanism between ldm and the greeter where ldm can provide information about the servers to the greeter. The mechanism that comes to mind are POSIX message queues.
  • Launchpad Entry: ldm-load-balancing

  • Created: 2007-04-30

  • Contributors: Milosz Tanski, Francis Giraldeau

  • Packages affected: ltsp-server, ltsp-client

Summary

This specification is about extending LTSP to support multiple application servers.

Rationale

To support larger rollouts of LTSP, you could use a bigger machine, or adding many smaller ones. The later is related to the idea of Bewolf clusters, that are made of commodity hardware. The goal is to make it possible to add any number of additional secondary application servers to the network, and dispatch users on them.

Use Cases

Rudolfo has 1400 thin clients and 30 application servers. He would like to balance users sessions across the available application servers. He want to be able to add new servers while the needs growth.

Margot teaches computer science at a school in a rural area, and she has 17 recycled thin clients available for her lab, but she does not have a server available to support the 17 TCs. Instead, she has scraped together three desktop class machines for this purpose, and she wants to be able to support all 17 TCs with concurrent logins.

Scope

This spec is concerned with load balancing login sessions across multiple application servers. It covers :

  • Support many load distribution algorithms
  • This spec is about LDM, but XDMCP should be supported also
  • The integration of ltsp-loadbalancer
  • Make sure that other tools are working in such setup

Design

In LDM, the script $CHROOT/usr/lib/ltsp/get_hosts, if exists, will return the server or the server list for login. This script has to be written by the administrator in gutsy.

A default implementation should be provided, with those algorithms :

  • ltsp-loadbalancer client
  • randomized : take one server randomly
  • best rated : take the best server
  • weightned best rated : randomize, but give more probability to best rated servers
  • fixed : load balancing disabled

For advanced loadbalancing, ltsp-loadbalancer would be used. Smaller setups could use other algorithms to easy the administration. The algorithm selection will be done by setting a variable in lts.conf file.

The get_hosts programm should be keept small and efficient, because it runs on the thin-client.

LDM can display several servers in the GUI.

For XDMCP setups, only the first server in the list will be used for query.

Implementation

Outstanding Issues

BoF agenda and discussion

Prototype (deprecated)

The first iteration of the implementation is available at: http://ltsp.mindtouchsoftware.com/ltsp-loadbalance.

It consists of a stand alone server component (ltsp-server-advertise) that is able to provide clients with statistics to resources available. This is implemented in python as a daemon. The application waits for incoming inquisitions on port (currently 377). It returns a xml document with the statistics and closes the socket.

A client component is mostly self contained in it's own (mostly self contained) module (pickserver.py). This module is integrated with ldm where it periodically (right now the default is 5 seconds, a more sensible would be every 30 seconds or a minute) queries the servers from a predefined list of servers. As soon as the user logs in the best server is already available. Changes required to ldm are minimally invasive since most of the code (as state above) is split into a separate module.

It's very hard to pick the "best" server. Here are some options:

  1. Implemented: The other option provides the administrator some tunable parameters like to ignore servers that are swapping, ignore servers that take x ms to respond, ignore servers that have a load higher then x (normalized to number of processors). Random is uses on the remaining servers. If no candidates available, it pick the random server from the list of servers that responded.
  2. Implemented: Throw away the non responding servers, pick a random one from the remaining servers (random tends to make a good choice a high percentage of the time).
  3. Unimplemented: Each client uses the same server (dynamically eliminating unresponsive servers from the list of what's available) every session. This can be done by hashing the TC MAC address and taking a modulus into the list of available servers.
  4. Unimplemented: The user is presented with a list of available servers (preferably with some stats about the server load) and chooses the preferred server. One server can be automatically recommended. (This is a variation of the first option, above.)
    • Status info may include: "Down", "Idle", "Busy", "Swapped"
    • The chooser should auto-refresh periodically.
  5. We will modify ltsp-update-sshkeys to make configuration of additional servers easy, it will check if /etc/ltsp/extraservers exists, if so it will connect to the listed servers, retrieve the keys and append them to the ssh_known_hosts file in the chroot.

Because the existing code for querying servers is in Python and the next generation greeter is written in C it's impossible to the already written module directly. Also, there's plans for future non Gtk greater (Qt comes to mind). The solution for this is to use a IPC mechanism between ldm and the greeter where ldm can provide information about the servers to the greeter. The mechanism that comes to mind are POSIX message queues.


CategorySpec

LDMLoadbalancingSupport (last edited 2008-08-06 16:14:46 by localhost)