ADD YOUR COMMENTS AT THE BOTTOM IN THE PROPER SECTION!

Summary

To get good support for high performance and availability clusters in Ubuntu, we need to address the following problems:

Rationale

Breezy was the first Ubuntu release with server CDs. Already, people are deploying Ubuntu in server environments. There is some demand for a free Linux platform for HA and HPC. This is an area where most non-commercial distributions are quite lacking, and a good way to attract a new crowd to Ubuntu, as well as to help dispell the 'desktop-only' distribution image by showing Ubuntu in a cutting-edge server light.

Use cases

Implementation

For Edgy, we will specifically support the following individual clustering components:

Postponed:

This means we will:

Outstanding issues

NOTES

Software Evaluated in the process (and comments)

SLURM is the job queue/resource manager from the Lawrence Livermore National Laboratory, and is widely used for resource management in high-end clusters, scaling up to BlueGene/L's 65 thousand nodes with twice as many processors. Will integrate.

Oscar and gridengine are extremely big and complex and they do not come from a Debian/Ubuntu world. Probably part of the code will need porting from RedHat/Solaris. Will not integrate.

Ganglia is a widely used cluster monitoring tool. Will promote from main.

Comments

Questions that were raised before the BoF:

Other ideas from the community

* Building, integrating and packaging Glite (the new standard for the biggest production grid: EGEE). See the initial thoughts posted here

* Reviewing the possible integration and packaging of grid-enabled apps

* Suggestions: Narrow the scope as much as possible to facilitate progress. Focus initially on small clusters, simple NFS home directory sharing, ldap authentication, ganglia (+gexec?), simple management tools to synchronize configuration across nodes (comments above give pointers to other specs), MPI, and some form of batch scheduling. You can do a lot with just these (I may have left out some essentials, please note if so). Leave high availability and single-system image to other specs. Leave shared file systems for a second round. The key issue is packaging so that setting up a simple cluster is mostly painless. (Or perhaps cluster home/client installation modes on the CD.)

FabioMassimoDiNitto: why do we need to regress from what we already have? Most of the stuff you mention is either not a cluster spec or already done.

* ChristopherBayliss: Who put his comment in the comments section as it wasn't an other idea.

Personally having Globus in Ubuntu would be great but it looks like a big job. Perhaps someone with more/some packaging skills should guestimate the effort required.

"FabioMassimoDiNitto replies: such as who? are you offering to help?"

ChristopherBayliss replies: Such as someone with experience of packaging big and complicated software packages from scratch. I have looked at it and realised that I don't have the requisite skills and or time. Packaging globus appears, at least to me, to be a task in itself. You can get debs of globus from VDT and various groups put together RPMs which may be a viable starting point. How much effort is required to make them propper well behaved packages is another question.

"FabioMassimoDiNitto: Once the spec is approved, it must not be changed. The comments (no matter of what kind must be collected at the end of the page. Note the big title with such a warning is not a case. Your request of other people for packaging is inappropriate and in my opinion unfair since you have no idea who will pick up the task in edgy+1 and assuming that somebody will. Remember that this spec is closed, approved and in progress. Whatever you add goes for edgy+1, but most of the work from the community is already collected here to spare time later on."

AdrianRGoalby: The 'Use cases' above seem to concentrate on very large clusters, I think this is the wrong direction. There will be a demand for several such large clusters; but there are several thousand small companies, schools, departments of large organisations, individuals, etc. that would benefit from small high availability clusters. Large clusters are assembled using specialised hardware and are administered by full-time computer experts who configure and customise them as they require. Small clusters will probably use more standard computer and networking equipment and be expected to run with little administration, minimum configuration and no customisation. The aims and features of the OpenSSI (Single System Image) Clustering project http://wiki.openssi.org/go/Main_Page seem to be more appropriate to small clusters than the aims and features of UbuntuEdgyClusters. I think OpenSSI is being integrated into Debian already, I would suggest that the clustering effort of Ubuntu should be towards getting small OpenSSI clusters working well as part of Ubuntu Server. Imagine thousands of offices, each with a small stack of computer nodes, storage and networking in the corner, where the manager knows that the staff can continue working normally even if any part of the server fails. The manager will consider that well worth the cost of the extra hardware, provided it is relatively easy to install, administer and upgrade. Of course these thousands of small clusters should be running Ubuntu Server.


[MoeJette]: Our intent for Slurm version 1.3 (available late in 2007) is to have an encryption plugin that supports OpenSSL plus a GNU compatable library. Regarding the OpenSSL license, how is that a problem? It seems to be completely open? Regarding Ubuntu, most of the Slurm developers already use it (even if it isn't used on our production clusters).

UbuntuEdgyClusters (last edited 2008-08-06 16:41:22 by localhost)