clustering

  • Launchpad Entry: TBD

  • Created: Roderick B. Greening

  • Contributors: Roderick B. Greening

  • Packages affected: TBD

Summary

To build a High-Availability Cluster, suitable for SMB. The cluster will contain 3+ nodes, each able to run a variety of VM's. VM's should be able to auto-failover to another node in the cluster. Storage will be internal to each node, thus we require a cluster file system based on network sync.

Release Note

SMB's will be able to easily create a cluster from a set of servers, with minimal maintenance required. The cluster will provide a high availability of services via VM's and failover. This will provide peace of mind for the SMB.

Rationale

Clustering is a mystery to many administrators. This spec will help iron out the details on how to build a cluster, in a cost-effective, but productive manner.

User stories

As a real world case, I have to provide services like DNS, E-mail, Web (private and public), EMS (element monitoring), and Database. Under the current strategy, these are run under different systems, with different OSes. They are hard to manage and maintain and it is extremely inefficient.

Using a cluster, and running the services within VM's, will allow us to consolidate systems, provide higher availability, and better usage of resources. VM's can easily be moved from one node to another to distribute load. VM's can be upgraded offline without downtime. We also get to have a common underlying OS for the cluster, which eases maintenance and management.

Assumptions

  • We will be using packages from Karmic (or higher).
  • Require high-availability
  • Load balancing is not required (but would be nice - cloud computing?)
  • No external storage system
  • Minimum of 3 nodes (though it can be done with 2)
  • Nodes will have sufficient CPU/RAM/Disk/LAN capacity to support the VM/Services or can be upgraded to support (i.e. ability to add CPU's, RAM, Disk to each node over time to address growing needs)
  • Ability to grow the cluster by adding additional nodes

Design

See discussion below with ivoks on #ubuntu-server.

<rgreening> ivoks: ScottK suggested I ask you (as an expert) on a clustering/high-availability project I have.
<rgreening> ivoks: basically, I need to run a bunch of vm's with varied purposes (DNS, E-mail, EMS, Web) and ensure they are always reachable (always being relative of course). Basically survive failure of one node in the cluster by autostart on another node...
<rgreening> I'm only basically familiar with clustering... so any advice would be awesome.
<ivoks> ok, so... you want failover?
<rgreening> yeah, live...
<rgreening> not manual
<ivoks> right, live migration from failed node to alive one
<rgreening> but I also want to efficiently use the CPU, memory/ disk
<rgreening> which is why I think I need a cluster :)
<rgreening> I have 3 HP DL380G6 (brand new
<rgreening> each has 4GB ram, 3x300 10k SAS disks
<rgreening> internal HW RAID card for disks
<rgreening> 4xGE ports per server
<rgreening> 2.4GHz quad core CPU
<ivoks> ok
<ivoks> having three servers is good step
<ivoks> since 2 aren't quite smart solution :)
<ivoks> rgreening: you'll need drbd+ocfs+pacemaker
<ivoks> rgreening: or... redhat cluster suite + drbd
<rgreening> cool
<rgreening> ivoks: ocfs?
<ivoks> rgreening: oracle cluster filesystem
<rgreening> ah
<ivoks> ocfs2
<rgreening> is there a reason/circumstance to prefer one over the other?
<ivoks> rgreening: well, rhcs is pita to configure, but it's great stuff
<ivoks> rgreening: pacemaker is easier, but it didn't get real attention in ubuntu until karmic
<ivoks> rgreening: rhcs was supported clustering system in ubuntu
<rgreening> ok, I am thinking on starting with Karmic anyway... as this setup will be in test/development for a 4 month period at least
<ivoks> rgreening: and you'll have hard time finding documentation for both
<rgreening> I am all too familiar with the lack of docs... been search and researching last few days..
<ivoks> rgreening: great... then you could help us make our cluster stack rock solid
<rgreening> :P
<rgreening> lets make a deal. You point me in the right direction when I need a course change and I'll help with the cluster packages (since it'll benefit me anyway) :)
<rgreening> ivoks: ^ .. So, if it were you then using Karmic, oracle or red hat path?
<ivoks> rgreening: i'd use karmic, with pacemaker-openais as cluster manager and ocfs2 as filesystem on top of drbd8
<rgreening> ok. then my path is set. :)
<ivoks> and kvm :)
<rgreening> of course.
<ivoks> you might want to look at eucalyptus
<ivoks> you basicaly are asking for a cloud system
<rgreening> ivoks: is there a need to add a cloud on top of this?
<rgreening> or what advantage?
<ivoks> well, if you want load balancing...
<rgreening> if I am not talking ot other clouds or EC2 services
<rgreening> hmm...
<rgreening> ivoks: I don't need load balancing per se but shouldn't the vm's share the CPU/memory/disk of the cluster? 
<rgreening> cluster...
<ivoks> rgreening: no
<ivoks> there are couple of clusters
<ivoks> one is high availability
<ivoks> that's what i was talking about
<ivoks> other is hpc
<rgreening> ok. my bad. I want HA not HPC...
<ivoks> that's sharing cpu/mem, but applications should be aware of that
<ivoks> so, in HA cluster, you have to designate VMs to certain servers
<ivoks> and backup solutions if those servers fail
<rgreening> ivoks: ok.
<ivoks> so, let's say vm1 and vm2 on serverA
<ivoks> vm3 and vm4 on serverB
<ivoks> vm5 and vm6 on server C
<ivoks> if serverC fails, vm5 moves to serverA and vm6 moves to serverB
<rgreening> ok..
<rgreening> and this can be preconfigured/determined right?
<ivoks> serverA doesn't know a thing about serverB or serverC
<ivoks> it only knows their IP addresses and where to shoot if it wants to kill them
<rgreening> sounds reasonable ivoks. 
<ivoks> rgreening: that's fail over
<ivoks> then, you'll need shared storage
<ivoks> that can be NAS or DRBD
<ivoks> DRBD is basicaly a RAID1 over network
<ivoks> version in karmic supports having three nodes in primary-primary-primary setup
<rgreening> ok, so I have 4xGE on each server. and I have 3x300GB 10K SAS drives in HW RAID in each server...
<rgreening> so I don't need an external storage array?
<rgreening> I can get one, if it will give a large perf boost...
<ivoks> well, fully redundant fiber channel or 10Gb iscsi would be a better options, but let's pretend you don't have couple of hunderts of thousands of dolars :)
<rgreening> you'd be correect :)
<rgreening> lol
<rgreening> ivoks: you are awesome btw.
<ivoks> local disks are still the fastest thing
<ivoks> drbd will make them slower, but that's something rgreening will have to accept if he wants high availability
<rgreening> ivoks: 4xGE.. 
<ivoks> right, he could bond ethernets to form 2Gbps link
<rgreening> so, do I need to seperate the drbd ports from the regular vlan ports?
<ivoks> switch between them is gigabit?
<rgreening> yeah, I'll have a Cisco 37xx GE or 4xxx GE
<ivoks> so, keep in mind
<ivoks> drbd link - for block device sync; you can use the same link for ocfs2 sync
<ivoks> cluster link - for communication between nodes
<ivoks> and wan
<rgreening> so, 2xGE drbd/ocfs2, 1xGE cluster link, 1xGE wan/intenet/core vlans..
<ivoks> rgreening: right, you could use drbd/ocfs/cluster on same link
<ivoks> rgreening: and then bond them, making them faster and allowing for failover
<ivoks> if cable or network card dies
<rgreening> ivoks: ok, if I bond them, they can't go to different switches though, correct...
<ivoks> sort of...
<ivoks> you should look at ifenslave modes
<rgreening> ivoks: ok. cool
<rgreening> ivoks: where are you located?
<ivoks> croatia
<rgreening> cool
<ivoks> mode 2 could be interesting
<ivoks> i think that could work with different switches
<rgreening> ivoks: you have been such a huge help.
<rgreening> ty ty ty ivoks :)
<ivoks> :)
<ivoks> XOR policy: Transmit based on the selected transmit hash policy. The default policy is a simple
<ivoks> (source MAC address \oplus destination MAC address) % n_{slaves}
-*- rgreening owes ivoks beers 'n food 'n stuff :)
<ivoks> or x% of your earnings on the project :)
-*- rgreening is trying to save his job/career by doing this, as well as the jobs of the rest of the office.
<ivoks> hehe
<rgreening> :)
<rgreening> ivoks: I am waiting to mount the servers. I assume, a default Ubuntu Server install is the correct path.
<rgreening> and go from there
<ivoks> rgreening: sure
<rgreening> ok. I guess I need to join the server team :)
<rgreening> hah
<ivoks> rgreening: but, really, if you are planing on using ha cluster only for VMs
<ivoks> rgreening: you should invest some time in learning eucalyptus and ubuntu cloud strategy
<ivoks> i haven't looked at it yet, so i don't know details, but it might be what you are looking for
<rgreening> ivoks: would it still use the base stuff we already talked about?
<ivoks> i don't know
<ivoks> i have no idea what eucalyptus does exactly
<rgreening> ok... hehe
<ivoks> it's cloud :D
<rgreening> do you know who the expert to ask is?
<ivoks> ...anyone deploying eucalyptus? or knows what it does?
<rgreening> hehe
<ivoks> i'm sure there are
<ivoks> erichammond could give you some info about what cloud really is
<ivoks> i see him bloging about clouds all the time

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.


CategorySpec

rgreening/clustering (last edited 2009-07-10 16:20:14 by sawfly)