UbuntuEdgyClusters

ADD YOUR COMMENTS AT THE BOTTOM IN THE PROPER SECTION!

Summary

To get good support for high performance and availability clusters in Ubuntu, we need to address the following problems:

  • Currently, there's little shared file system support
  • Little support for failover and HA
  • MPI support is lacking.

Rationale

Breezy was the first Ubuntu release with server CDs. Already, people are deploying Ubuntu in server environments. There is some demand for a free Linux platform for HA and HPC. This is an area where most non-commercial distributions are quite lacking, and a good way to attract a new crowd to Ubuntu, as well as to help dispell the 'desktop-only' distribution image by showing Ubuntu in a cutting-edge server light.

Use cases

  • Company alpha wants to throw 250 of its Apache web application servers in a load-balanced pool, achieving 100% availability as a side benefit.
  • Company beta is developing a huge Oracle DB based application. They need scalability and availability. Their solution ends up being built on Ubuntu with OCFS2 as the backend filesystem.
  • University gamma is developing a new, extremely complex algorithm to break new science frontiers and they need a HPC cluster. Of course, they want Ubuntu as their development and computation platform because it provides an integrated environment to do so.
  • Hospital delta is deploying hundreds of thin clients with Ubuntu and LTSP, and wants complete failover for the terminal services.

Implementation

For Edgy, we will specifically support the following individual clustering components:

  • GFS2 filesystem (done)
  • GFS filesystem (done)
  • OCFS2 filesystem (done)

Postponed:

  • the DRBD shared device solution (HA) (to be re-evaluated. Dapper was no go)

  • the ganglia monitoring solution (HA, HPC)

  • Lustre's public releases no longer lag the commercial version, which might make a high-performance cluster filesystem possible in Ubuntu.
  • Open MPI should probably be checked out (fairly new free MPI implementation).
  • The maui scheduler to go with slurm, for more advanced scheduling policy, fair sharing, etc

This means we will:

  • update GFS2 and OCFS2 (done)
  • promote ganglia, drbd8-utils from universe to main (postponed)
  • re-evaluate Lustre and drbd for inclusion (postponed)

Outstanding issues

  • SLURM licence is GPL and links with openssl that's a no go. Consider porting to tls.

NOTES

  • The sections below are informational, and there is clearly no need to review them.

Software Evaluated in the process (and comments)

SLURM is the job queue/resource manager from the Lawrence Livermore National Laboratory, and is widely used for resource management in high-end clusters, scaling up to BlueGene/L's 65 thousand nodes with twice as many processors. Will integrate.

Oscar and gridengine are extremely big and complex and they do not come from a Debian/Ubuntu world. Probably part of the code will need porting from RedHat/Solaris. Will not integrate.

Ganglia is a widely used cluster monitoring tool. Will promote from main.

Comments

Questions that were raised before the BoF:

  • How to make sure all nodes are up to date?
  • How to make sure only the user that are currently running jobs on the node have access to it?
  • How to distribute /etc/passwd and/or any other user related information? Or should ldap+kerberos be used?
  • How to handle optimized mathlibs?
  • How to handle various versions of MPI?
    • These are 'just another package'. We package them up, and throw them in the archive.
  • Are Single System Image clusters in scope? How about diskless clusters?
  • Other than just basic clustering, there are alot of other things needed that users associate with clusters for high performance computing: Profiling, Sandboxing, Tuning, Thirdparty Vendor Support and so on.
    • No, with this spec we define what are the packages we want to provide and support. diskless cluster can be object of another spec.
  • What about including, configuring, etc. the Globus Toolkit (http://www.globus.org/) in this area?

    • is it packaged already?

Other ideas from the community

* Building, integrating and packaging Glite (the new standard for the biggest production grid: EGEE). See the initial thoughts posted here

* Reviewing the possible integration and packaging of grid-enabled apps

* Suggestions: Narrow the scope as much as possible to facilitate progress. Focus initially on small clusters, simple NFS home directory sharing, ldap authentication, ganglia (+gexec?), simple management tools to synchronize configuration across nodes (comments above give pointers to other specs), MPI, and some form of batch scheduling. You can do a lot with just these (I may have left out some essentials, please note if so). Leave high availability and single-system image to other specs. Leave shared file systems for a second round. The key issue is packaging so that setting up a simple cluster is mostly painless. (Or perhaps cluster home/client installation modes on the CD.)

FabioMassimoDiNitto: why do we need to regress from what we already have? Most of the stuff you mention is either not a cluster spec or already done.

* ChristopherBayliss: Who put his comment in the comments section as it wasn't an other idea.

  • In response to the Globus packaging question in the BoF comments:
    • Globus is packaged with Globus|Grid Packaging Tools GPT the source installation of Gobus just bootstraps a copy of GPT then builds with it.

    • There are several, incompatible, versions of Globus in use 2.x, 3.x and 4.0.x with 4.1 and 4.2 in development.
      • 2.4 is used by the National Grid Service (NGS) in the UK.

    • Globus bundles a lot of other bits of software along with GLobus a complete list of licences can be found here.

    • The "Quick Start" guide shows a basic GT4 cluster install.

    • The user will either have CA and machine certificates to plug into an existing organisation or want to set up a miniCA for local use. It would be nice if any package installer handled this sort of thing. ( They actually use a patched version )
    • Globus 4.0.2 does build happily under Dapper although you do need to use -j1 with make.
    • The Python bindings mentioned on the globus website are actually pyGlobus and pyGridWare

    • The Java bindings are largely Axis 1.x and "CoG Kit"

    • Globus' MDS service can integrate with Ganglia.
    • GRAM ( remote job submission ) supports PBS, Condor and LSF. PBS and LSF are commercial and Condor only hands out source if you ask.

Personally having Globus in Ubuntu would be great but it looks like a big job. Perhaps someone with more/some packaging skills should guestimate the effort required.

"FabioMassimoDiNitto replies: such as who? are you offering to help?"

ChristopherBayliss replies: Such as someone with experience of packaging big and complicated software packages from scratch. I have looked at it and realised that I don't have the requisite skills and or time. Packaging globus appears, at least to me, to be a task in itself. You can get debs of globus from VDT and various groups put together RPMs which may be a viable starting point. How much effort is required to make them propper well behaved packages is another question.

"FabioMassimoDiNitto: Once the spec is approved, it must not be changed. The comments (no matter of what kind must be collected at the end of the page. Note the big title with such a warning is not a case. Your request of other people for packaging is inappropriate and in my opinion unfair since you have no idea who will pick up the task in edgy+1 and assuming that somebody will. Remember that this spec is closed, approved and in progress. Whatever you add goes for edgy+1, but most of the work from the community is already collected here to spare time later on."

AdrianRGoalby: The 'Use cases' above seem to concentrate on very large clusters, I think this is the wrong direction. There will be a demand for several such large clusters; but there are several thousand small companies, schools, departments of large organisations, individuals, etc. that would benefit from small high availability clusters. Large clusters are assembled using specialised hardware and are administered by full-time computer experts who configure and customise them as they require. Small clusters will probably use more standard computer and networking equipment and be expected to run with little administration, minimum configuration and no customisation. The aims and features of the OpenSSI (Single System Image) Clustering project http://wiki.openssi.org/go/Main_Page seem to be more appropriate to small clusters than the aims and features of UbuntuEdgyClusters. I think OpenSSI is being integrated into Debian already, I would suggest that the clustering effort of Ubuntu should be towards getting small OpenSSI clusters working well as part of Ubuntu Server. Imagine thousands of offices, each with a small stack of computer nodes, storage and networking in the corner, where the manager knows that the staff can continue working normally even if any part of the server fails. The manager will consider that well worth the cost of the extra hardware, provided it is relatively easy to install, administer and upgrade. Of course these thousands of small clusters should be running Ubuntu Server.


[MoeJette]: Our intent for Slurm version 1.3 (available late in 2007) is to have an encryption plugin that supports OpenSSL plus a GNU compatable library. Regarding the OpenSSL license, how is that a problem? It seems to be completely open? Regarding Ubuntu, most of the Slurm developers already use it (even if it isn't used on our production clusters).

UbuntuEdgyClusters (last edited 2008-08-06 16:41:22 by localhost)