LucidTesting
4291
Comment:
|
13456
Fixes to rhcs
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
'''Contents''' <<TableOfContents(2)>> |
|
Line 5: | Line 8: |
For this tests you'll need couple of machines of KVMs with Ubuntu 10.04. I strongly suggest three or more of them. |
|
Line 14: | Line 19: |
deb http://ppa.launchpad.net/ivoks/ppa/ubuntu lucid main | deb http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main |
Line 100: | Line 105: |
== Pacemaker with DRBD == You will need at least two servers. Each of those two servers must have one empty partition of the same size. All other servers can be part of the pacemaker cluster, but will not have drbd resources started on them. === 1. Complete test with standalone Pacemaker === === 2. [ALL] Install DRBD and other needed tools === {{{ sudo apt-get install linux-headers-server psmisc sudo apt-get install drbd8-utils }}} Since we will be using pacemaker for stoping and starting of drbd, remove it from runlevels: {{{ sudo update-rc.d -f drbd remove }}} === 3. [ALL] Set up DRBD === Create /etc/drbd.d/disk0.res file, containing: {{{ resource disk0 { protocol C; net { cram-hmac-alg sha1; shared-secret "lucid"; } on lucidclusterX { device /dev/drbd0; disk /dev/sdXY; address X.X.X.X:7788; meta-disk internal; } on lucidclusterY { device /dev/drbd0; disk /dev/sdXY; address X.X.X.Y:7788; meta-disk internal; } } }}} Make sure to replace lucidclusterX|Y with real hostnames of your two servers. Change X.X.X.X and X.X.X.Y to real IPs of those servers and sdXY to real partitions that will be used for drbd. Once you saved that file, create resource: {{{ sudo drbdadm create-md disk0 }}} You should get: {{{ Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. success }}} Finally, start drbd: {{{ sudo /etc/init.d/drbd start }}} sudo drbdadm status should return: {{{ <resource minor="0" name="disk0" cs="Connected" ro1="Secondary" ro2="Secondary" ds1="Inconsistent" ds2="Inconsistent" /> }}} === 4. [ONE] Create filesystem === One of your servers will act as primary server for start. You'll use it to create filesystem and force the other cluster to sync from it. On chosen server force it to be primary and create filesystem: {{{ sudo drbdadm -- --overwrite-data-of-peer primary disk0 sudo mkfs.ext3 /dev/drbd/by-res/disk0 }}} === 5. [ONE] DRBD+Pacemaker === Edit pacemaker configuration: {{{ crm configure edit }}} and add: {{{ primitive drbd_disk ocf:linbit:drbd \ params drbd_resource="disk0" \ op monitor interval="15s" primitive fs_drbd ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/disk0" directory="/mnt" fstype="ext3" ms ms_drbd drbd_disk \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" location loc-1 fs_drbd -inf: lucidclusterX location loc-2 drbd_disk -inf: lucidclusterX colocation mnt_on_master inf: fs_drbd ms_drbd:Master order mount_after_drbd inf: ms_drbd:promote fs_drbd:start }}} Replace lucidclusterX with hostname of node that doesn't have drbd. Save and fire up crm_mon. You should get something like this: {{{ ============ Last updated: Wed Jan 13 18:03:12 2010 Stack: openais Current DC: lucidcluster2 - partition with quorum Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d 3 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ lucidcluster2 lucidcluster3 lucidcluster1 ] Resource Group: group1 ip1 (ocf::heartbeat:IPaddr2): Started lucidcluster2 apache2 (lsb:apache2): Started lucidcluster2 Resource Group: group2 ip2 (ocf::heartbeat:IPaddr2): Started lucidcluster3 vsftpd (lsb:vsftpd): Started lucidcluster3 Master/Slave Set: ms_drbd Masters: [ lucidcluster2 ] Slaves: [ lucidcluster1 ] fs_drbd (ocf::heartbeat:Filesystem): Started lucidcluster2 }}} === 6. [ALL] Testing === Wait for drbd disks to get synced and start rebooting/killing your nodes. == Test results == ||'''Name'''||''Test''||''Passed/Failed''||''Comments''|| == Questions == = BONUS : RHCS Samba file server cluster = {{attachment:IconsPage/warning.png}} '''This guide is an early draft.''' == Overview == Create a fully functional 2 node cluster, offering an active/active samba file server on shared storage. == Testing environment == * A standard x86_64 pc running libvirt and virt-manager * 2 kvm guests to act as 2 nodes * A shared raw virtio image to act as shared storage Cluster components : * Redhat Cluster Suite 3.0.6 * Cluster LVM * GFS2 * Samba + CTDB Network : 192.168.122.0/24, gateway : 192.168.122.1 * node01 192.168.122.201 * node02 192.168.122.202 == Cluster Configuration Steps == * [HOST] : Step to be done on the KVM host. * [ONE] : Steps to be done on only '''ONE''' node. * [ALL] : Steps to be done on all nodes. === [HOST] Setup the host === * Create 2 kvm guests, I strongly suggest to use libvirt since it will provide a fencing method for the nodes. * Add a shared raw disk image with cache=off to mimic the shared storage * Install the 2 nodes with the latest lucid-server-iso === [ALL] Prepare the nodes === Assign a static ip and add it to both hosts files Add the ubuntu-ha experimental ppa to the source list {{{ deb http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main deb-src http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main }}} {{{ # apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B64F0AFA # apt-get update }}} Install Redhat Cluster Suite {{{ # apt-get install redhat-cluster-suite }}} === [ONE] Prepare the shared drive === Partition the shared storage, one small partition for the quorum disk (50MB) and the rest for the cluster lvm. {{{ # parted /dev/vdb mklabel msdos # parted /dev/vdb mkpart primary 0 50MB # parted /dev/vdb mkpart primary 50MB 100% # parted /dev/vdb set 2 lvm on }}} Create the quorum disk, the label (-l) will be used in the cluster configuration. {{{ # mkqdisk -l bar01 -c /dev/vdb1 }}} === [ALL] === Reread the partition table {{{ # partprobe }}} Copy the cluster config file : /etc/cluster/cluster.conf TODO: Detail the cluster config file. {{{ <?xml version="1.0"?> <cluster name="Foo01" config_version="1"> <!-- 1 vote per node and 1 vote for the quorum disk, the shared storage is the tie-breaker --> <cman two_node="0" expected_votes="3"/> <!-- Configure the quorum disk --> <quorumd interval="1" tko="10" votes="1" label="bar01"> <heuristic program="ping 192.168.122.1 -c1 -t1" score="1" interval="2" tko="3"/> </quorumd> <!-- Leave a grace period of 20 second for nodes to join --> <fence_daemon post_join_delay="20"/> <!-- Enable debug logging --> <logging debug="off"/> <!-- Nodes definition (node ids are mandatory and have to be below 16)--> <clusternodes> <clusternode name="node01" nodeid="1"> <fence> <method name="virsh"> <device name="virsh" port="node01" action="reboot"/> </method> </fence> </clusternode> <clusternode name="node02" nodeid="2"> <fence> <method name="virsh"> <device name="virsh" port="node02" action="reboot"/> </method> </fence> </clusternode> </clusternodes> <!-- Use libvirt virsh to fence nodes --> <fencedevices> <fencedevice name="virsh" agent="fence_virsh" ipaddr="192.168.122.1" login="root" passwd="xxxxx"/> </fencedevices> </cluster> }}} Simultaneously start the base cluster service (cman) on both nodes, if you don't the other node will get fenced when the post join delay expires. {{{ # /etc/init.d/cman start }}} Once the cluster is quorate, start the secondary cluster services. {{{ # /etc/init.d/clvm start # /etc/init.d/rgmanager start }}} == GFS Configuration Steps == Before starting this, you need a fully functionning quorate cluster. === [ONE] Prepare the cluster fs === Create the clustered volume group. {{{ # pvcreate /dev/vdb2 # vgcreate vgcluster01 /dev/vdb2 }}} Create a logical volume. {{{ # lvcreate vgcluster01 -l100%VG -n gfs01 }}} Create the gfs2 filesystem. {{{ # mkfs.gfs2 -p lock_dlm -t Foo01:Gfs01 -j 3 /dev/mapper/vgcluster01-gfs01 }}} === [ALL] === Add the gfs filesystem to fstab. {{{ /dev/mapper/vgcluster01-gfs01 /mnt/gfs01 gfs2 defaults 0 0 }}} Create the mountpoint. {{{ # mkdir /mnt/gfs01 }}} Mount the filesystem. {{{ # /etc/init.d/gfs2-tools start }}} Both nodes should now be fully functional, stop them and start them simultaneously to see if the cluster get quorate. Currently plytmouth seems completely broken, so it's impossible to have the boot message to debug cluster initialization. == Samba Configuration Steps == Before starting this, you need a working clustered filesystem. TODO: Samba + CTDB configuration. |
Test cases for cluster components in Ubuntu 10.04
Contents
Contents
Overview
For this tests you'll need couple of machines of KVMs with Ubuntu 10.04. I strongly suggest three or more of them.
Each test will be enumerated. Following this steps you shouldn't have problems. Note that each step is marked with [ALL] or [ONE]. If it's marked with [ALL], you should repeat it on each server in your cluster. If it's marked with [ONE], pick one server and do that step only on that server.
Pacemaker, standalone
1. [ALL] Add testing PPA
Add this PPA to your /etc/apt/sources.list:
deb http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main
2. [ALL] install pacemaker
sudo apt-get install pacemaker
edit /etc/default/corosync and enable corosync (START=yes)
3. [ONE] generate corosync authkey
sudo corosync-keygen
(this can take a while if there's no enough entropy; download ubuntu iso image on the same machine while generating to speed it up or use keyboard to generate entropy)
copy /etc/corosync/authkey to all servers that will form this cluster (make sure it is owned by root:root and has 400 permissions).
4. [ALL] configure corosync
In /etc/corosync/corosync.conf replace bindnetaddr (by defaults it's 127.0.0.1) with network address of your server, replacing last digit with 0. For example, if your IP is 192.168.1.101, then you would put 192.168.1.0.
5. [ALL] start corosync
sudo /etc/init.d/corosync start
Now your cluster is configured and ready to monitor, stop and start your services on all your cluster servers.
6. [ALL] install services that will fail over between servers
In this example, I'm installing apache2 and vsftpd. You may install any other service...
sudo apt-get install apache2 vsftpd
Disable their init scripts:
update-rc.d -f apache2 remove update-rc.d -f vsftpd remove
7. [ONE] add some services
In this example, I'll create failover for apache2 and vsftpd service. I'll also add two additional IPs and tie apache2 with one of them, while vsftpd will be grouped with another one.
sudo crm configure edit
It you get empty file, close it and wait for couple of seconds (10-20) and try again. You should get something like this:
node lucidcluster1 node lucidcluster2 node lucidcluster3 property $id="cib-bootstrap-options" \ dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \ cluster-infrastructure="openais" \ expected-quorum-votes="2"
Add following lines bellow 'node' declarations. Replace X.X.X.X and X.X.X.Y with addresses that will fail over - do not put IPs of your servers there. Do not save and exit after adding this lines:
primitive apache2 lsb:apache2 op monitor interval="5s" primitive vsftpd lsb:vsftpd op monitor interval="5s" primitive ip1 ocf:heartbeat:IPaddr2 params ip="X.X.X.X" nic="eth0" primitive ip2 ocf:heartbeat:IPaddr2 params ip="X.X.X.Y" nic="eth0" group group1 ip1 apache2 group group2 ip2 vsftpd order apache_after_ip inf: ip1:start apache2:start order vsftpd_after_ip inf: ip2:start vsftpd:start
Now that you've put some services into configuration, you should also define how many servers are needed for a quorum and what stonith devices will be used. For this test, we won't use stonith devices.
Under property, add expected-quorum-votes and stonith-enabled, so that it looks like this (don't forget '\'!). Replace 'X' with number of servers needed for quorum (X should be less or equal to N-1, but not 1 unless there are only two servers in cluster, where N is number of servers):
property $id="cib-bootstrap-options" \ dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \ cluster-infrastructure="openais" \ expected-quorum-votes="X" \ stonith-enabled="false"
Save and quit.
8. [ALL] monitor and stress test
On each server start crm_mon (sudo crm_mon) and monitor how services are grouped and started. Then, one by one, reboot or shutdown servers, leaving at least on running.
First test with normal shutdown, then with pulling the AC plug (destroying domains in KVM).
In all this cases, once servers are up, they should be Online (monitor servers status in crm_mon) after some time. Services should migrate between them without problems.
Pacemaker with DRBD
You will need at least two servers. Each of those two servers must have one empty partition of the same size. All other servers can be part of the pacemaker cluster, but will not have drbd resources started on them.
1. Complete test with standalone Pacemaker
2. [ALL] Install DRBD and other needed tools
sudo apt-get install linux-headers-server psmisc sudo apt-get install drbd8-utils
Since we will be using pacemaker for stoping and starting of drbd, remove it from runlevels:
sudo update-rc.d -f drbd remove
3. [ALL] Set up DRBD
Create /etc/drbd.d/disk0.res file, containing:
resource disk0 { protocol C; net { cram-hmac-alg sha1; shared-secret "lucid"; } on lucidclusterX { device /dev/drbd0; disk /dev/sdXY; address X.X.X.X:7788; meta-disk internal; } on lucidclusterY { device /dev/drbd0; disk /dev/sdXY; address X.X.X.Y:7788; meta-disk internal; } }
Make sure to replace lucidclusterX|Y with real hostnames of your two servers. Change X.X.X.X and X.X.X.Y to real IPs of those servers and sdXY to real partitions that will be used for drbd.
Once you saved that file, create resource:
sudo drbdadm create-md disk0
You should get:
Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. success
Finally, start drbd:
sudo /etc/init.d/drbd start
sudo drbdadm status should return:
<resource minor="0" name="disk0" cs="Connected" ro1="Secondary" ro2="Secondary" ds1="Inconsistent" ds2="Inconsistent" />
4. [ONE] Create filesystem
One of your servers will act as primary server for start. You'll use it to create filesystem and force the other cluster to sync from it. On chosen server force it to be primary and create filesystem:
sudo drbdadm -- --overwrite-data-of-peer primary disk0 sudo mkfs.ext3 /dev/drbd/by-res/disk0
5. [ONE] DRBD+Pacemaker
Edit pacemaker configuration:
crm configure edit
and add:
primitive drbd_disk ocf:linbit:drbd \ params drbd_resource="disk0" \ op monitor interval="15s" primitive fs_drbd ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/disk0" directory="/mnt" fstype="ext3" ms ms_drbd drbd_disk \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" location loc-1 fs_drbd -inf: lucidclusterX location loc-2 drbd_disk -inf: lucidclusterX colocation mnt_on_master inf: fs_drbd ms_drbd:Master order mount_after_drbd inf: ms_drbd:promote fs_drbd:start
Replace lucidclusterX with hostname of node that doesn't have drbd. Save and fire up crm_mon. You should get something like this:
============ Last updated: Wed Jan 13 18:03:12 2010 Stack: openais Current DC: lucidcluster2 - partition with quorum Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d 3 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ lucidcluster2 lucidcluster3 lucidcluster1 ] Resource Group: group1 ip1 (ocf::heartbeat:IPaddr2): Started lucidcluster2 apache2 (lsb:apache2): Started lucidcluster2 Resource Group: group2 ip2 (ocf::heartbeat:IPaddr2): Started lucidcluster3 vsftpd (lsb:vsftpd): Started lucidcluster3 Master/Slave Set: ms_drbd Masters: [ lucidcluster2 ] Slaves: [ lucidcluster1 ] fs_drbd (ocf::heartbeat:Filesystem): Started lucidcluster2
6. [ALL] Testing
Wait for drbd disks to get synced and start rebooting/killing your nodes.
Test results
Name |
Test |
Passed/Failed |
Comments |
Questions
BONUS : RHCS Samba file server cluster
This guide is an early draft.
Overview
Create a fully functional 2 node cluster, offering an active/active samba file server on shared storage.
Testing environment
- A standard x86_64 pc running libvirt and virt-manager
- 2 kvm guests to act as 2 nodes
- A shared raw virtio image to act as shared storage
Cluster components :
- Redhat Cluster Suite 3.0.6
- Cluster LVM
- GFS2
- Samba + CTDB
Network : 192.168.122.0/24, gateway : 192.168.122.1
- node01 192.168.122.201
- node02 192.168.122.202
Cluster Configuration Steps
- [HOST] : Step to be done on the KVM host.
[ONE] : Steps to be done on only ONE node.
- [ALL] : Steps to be done on all nodes.
[HOST] Setup the host
- Create 2 kvm guests, I strongly suggest to use libvirt since it will provide a fencing method for the nodes.
- Add a shared raw disk image with cache=off to mimic the shared storage
- Install the 2 nodes with the latest lucid-server-iso
[ALL] Prepare the nodes
Assign a static ip and add it to both hosts files
Add the ubuntu-ha experimental ppa to the source list
deb http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main deb-src http://ppa.launchpad.net/ubuntu-ha/ppa/ubuntu lucid main
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B64F0AFA # apt-get update
Install Redhat Cluster Suite
# apt-get install redhat-cluster-suite
[ONE] Prepare the shared drive
Partition the shared storage, one small partition for the quorum disk (50MB) and the rest for the cluster lvm.
# parted /dev/vdb mklabel msdos # parted /dev/vdb mkpart primary 0 50MB # parted /dev/vdb mkpart primary 50MB 100% # parted /dev/vdb set 2 lvm on
Create the quorum disk, the label (-l) will be used in the cluster configuration.
# mkqdisk -l bar01 -c /dev/vdb1
[ALL]
Reread the partition table
# partprobe
Copy the cluster config file : /etc/cluster/cluster.conf TODO: Detail the cluster config file.
<?xml version="1.0"?> <cluster name="Foo01" config_version="1"> <!-- 1 vote per node and 1 vote for the quorum disk, the shared storage is the tie-breaker --> <cman two_node="0" expected_votes="3"/> <!-- Configure the quorum disk --> <quorumd interval="1" tko="10" votes="1" label="bar01"> <heuristic program="ping 192.168.122.1 -c1 -t1" score="1" interval="2" tko="3"/> </quorumd> <!-- Leave a grace period of 20 second for nodes to join --> <fence_daemon post_join_delay="20"/> <!-- Enable debug logging --> <logging debug="off"/> <!-- Nodes definition (node ids are mandatory and have to be below 16)--> <clusternodes> <clusternode name="node01" nodeid="1"> <fence> <method name="virsh"> <device name="virsh" port="node01" action="reboot"/> </method> </fence> </clusternode> <clusternode name="node02" nodeid="2"> <fence> <method name="virsh"> <device name="virsh" port="node02" action="reboot"/> </method> </fence> </clusternode> </clusternodes> <!-- Use libvirt virsh to fence nodes --> <fencedevices> <fencedevice name="virsh" agent="fence_virsh" ipaddr="192.168.122.1" login="root" passwd="xxxxx"/> </fencedevices> </cluster>
Simultaneously start the base cluster service (cman) on both nodes, if you don't the other node will get fenced when the post join delay expires.
# /etc/init.d/cman start
Once the cluster is quorate, start the secondary cluster services.
# /etc/init.d/clvm start # /etc/init.d/rgmanager start
GFS Configuration Steps
Before starting this, you need a fully functionning quorate cluster.
[ONE] Prepare the cluster fs
Create the clustered volume group.
# pvcreate /dev/vdb2 # vgcreate vgcluster01 /dev/vdb2
Create a logical volume.
# lvcreate vgcluster01 -l100%VG -n gfs01
Create the gfs2 filesystem.
# mkfs.gfs2 -p lock_dlm -t Foo01:Gfs01 -j 3 /dev/mapper/vgcluster01-gfs01
[ALL]
Add the gfs filesystem to fstab.
/dev/mapper/vgcluster01-gfs01 /mnt/gfs01 gfs2 defaults 0 0
Create the mountpoint.
# mkdir /mnt/gfs01
Mount the filesystem.
# /etc/init.d/gfs2-tools start
Both nodes should now be fully functional, stop them and start them simultaneously to see if the cluster get quorate. Currently plytmouth seems completely broken, so it's impossible to have the boot message to debug cluster initialization.
Samba Configuration Steps
Before starting this, you need a working clustered filesystem.
TODO: Samba + CTDB configuration.
ClusterStack/LucidTesting (last edited 2012-02-15 17:31:40 by soho85-138)