Summary

This specification defines the steps deemed valuable and necessary to improve support for clustered file systems, especially in the cloud, for the Maverick release.

Release Note

GlusterFS is included in the cloud-init tool to make deploying GlusterFS in the cloud simpler.

The Ceph clustered file system is available as a technology preview via the ceph package.

Rationale

In order to support our users building highly available, scalable distributed applications in both traditional and cloud-computing environments, we should foster the usage of the best of breed clustered filesystems. GlusterFS has been researched and appears to be stable and highly performant. Ceph looks to be even more exciting for high performance applications and should take advantage of BTRFS's natural copy-on-write abilities.

User stories

As a web applications developer I want to be able to utilize scalable storage for large data sets easily. I create cloud images with cloud-init specifying gluster-client and a configuration for finding the servers, and when I deploy these images they automatically have access to the clustered filesystem.

As a web applications developer I want to test my application on top of Ceph to see if it will be a viable option for the future. I install ceph, setup a small ceph cluster, and then remove them when I am done.

Assumptions

Design

Gluster

Lucid has excellent gluster server and client support in Universe as synced from debian. These packages should be kept current with the latest release of Gluster.

cloud-init

The ability to deploy images with gluster mounts or exports should be added to cloud-init.

Ceph

Ceph is still marked as experimental, and in fact the user space tools warn the user at every chance of that fact. However it makes good sense to help people try it out now that it is in the upstream kernel.

Kernel

Ceph was merged into the kernel with the 2.6.34 release, which is currently the kernel of choice for Maverick.

The kernel team has committed to building ceph support as a module for Maverick.

Upstream Debian Packages

The ceph project produces debian packages already. These packages should be reviewed and uploaded into Universe (or possibly directly into debian?)

http://ceph.newdream.net/wiki/Debian

Implementation

Please see the whiteboard in server-maverick-cloud-gluster under WorkItems

Test/Demo Plan

Install gluster Servers

Test gluster clients

Test fault tolerance

Benchmark gluster vs. NFS

Post blog entry to Ubuntuserver.wordpress.com With test results

Unresolved issues

gluster NFS support

Gluster includes support to mount volumes via NFS. This may prove useful for legacy integration, but is being left out of this discussion.

Gluster Storage Platform (GlusterSP, Glusterweb)

glusterweb is only available in release form on gluster.org as a .src.rpm, which is fairly simple to extract but would require special handling for users wanting to update the package.

According to a developer upstream here:

http://www.mail-archive.com/gluster-devel@nongnu.org/msg06924.html

These components of gluster are not ready for packaging. Upon inspection of them, its clear that they are designed only to be used as a dedicated management node. This still might be useful in cloud environments where the management node could even be spawned only when needed.

Setting up such an image is beyond the scope of this blueprint. Some time should be devoted to contacting the glusterfs community to prepare for possibly packaging or contributing upstream to getting these components into a deployable state.

BoF agenda and discussion

Integration options:

  1. package in universe, ensure uptodate (ceph)
  2. support into images (gluster)

Cloud file systems (need application support)

mogile fs (more of a datastore - object store)

hdfs (hadoop filesystem)

Sheepdog Project

Cluster filesystem (can be mounted in the fs, existing applications can use it unmodified)

gluster fs

ceph (+btrfs)

pvfs2 (martinbogo)

xtreemfs

gfs2, ocfs2. Defer to ha-cluster-stack blueprint.

tahoe-lafs


CategorySpec

MaverickClusterFilesystems (last edited 2010-08-19 20:15:49 by 76-216-240-245)