LucidTesting

Differences between revisions 2 and 3

Test cases for cluster components in Ubuntu 10.04

Contents

Contents

Test cases for cluster components in Ubuntu 10.04

Overview

For this tests you'll need couple of machines of KVMs with Ubuntu 10.04. I strongly suggest three or more of them.

Each test will be enumerated. Following this steps you shouldn't have problems. Note that each step is marked with [ALL] or [ONE]. If it's marked with [ALL], you should repeat it on each server in your cluster. If it's marked with [ONE], pick one server and do that step only on that server.

Pacemaker, standalone

1. [ALL] Add testing PPA

Add this PPA to your /etc/apt/sources.list:

deb http://ppa.launchpad.net/ivoks/ppa/ubuntu lucid main

2. [ALL] install pacemaker

sudo apt-get install pacemaker

edit /etc/default/corosync and enable corosync (START=yes)

3. [ONE] generate corosync authkey

sudo corosync-keygen

(this can take a while if there's no enough entropy; download ubuntu iso image on the same machine while generating to speed it up or use keyboard to generate entropy)

copy /etc/corosync/authkey to all servers that will form this cluster (make sure it is owned by root:root and has 400 permissions).

4. [ALL] configure corosync

In /etc/corosync/corosync.conf replace bindnetaddr (by defaults it's 127.0.0.1) with network address of your server, replacing last digit with 0. For example, if your IP is 192.168.1.101, then you would put 192.168.1.0.

5. [ALL] start corosync

sudo /etc/init.d/corosync start

Now your cluster is configured and ready to monitor, stop and start your services on all your cluster servers.

6. [ALL] install services that will fail over between servers

In this example, I'm installing apache2 and vsftpd. You may install any other service...

sudo apt-get install apache2 vsftpd

Disable their init scripts:

update-rc.d -f apache2 remove
update-rc.d -f vsftpd remove

7. [ONE] add some services

In this example, I'll create failover for apache2 and vsftpd service. I'll also add two additional IPs and tie apache2 with one of them, while vsftpd will be grouped with another one.

sudo crm configure edit

It you get empty file, close it and wait for couple of seconds (10-20) and try again. You should get something like this:

node lucidcluster1
node lucidcluster2
node lucidcluster3
property $id="cib-bootstrap-options" \
        dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2"

Add following lines bellow 'node' declarations. Replace X.X.X.X and X.X.X.Y with addresses that will fail over - do not put IPs of your servers there. Do not save and exit after adding this lines:

primitive apache2 lsb:apache2 op monitor interval="5s"
primitive vsftpd lsb:vsftpd op monitor interval="5s"
primitive ip1 ocf:heartbeat:IPaddr2 params ip="X.X.X.X" nic="eth0"
primitive ip2 ocf:heartbeat:IPaddr2 params ip="X.X.X.Y" nic="eth0"
group group1 ip1 apache2
group group2 ip2 vsftpd
order apache_after_ip inf: ip1:start apache2:start
order vsftpd_after_ip inf: ip2:start vsftpd:start

Now that you've put some services into configuration, you should also define how many servers are needed for a quorum and what stonith devices will be used. For this test, we won't use stonith devices.

Under property, add expected-quorum-votes and stonith-enabled, so that it looks like this (don't forget '\'!). Replace 'X' with number of servers needed for quorum (X should be less or equal to N-1, but not 1 unless there are only two servers in cluster, where N is number of servers):

property $id="cib-bootstrap-options" \
        dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="X" \
        stonith-enabled="false"

Save and quit.

8. [ALL] monitor and stress test

On each server start crm_mon (sudo crm_mon) and monitor how services are grouped and started. Then, one by one, reboot or shutdown servers, leaving at least on running.

First test with normal shutdown, then with pulling the AC plug (destroying domains in KVM).

In all this cases, once servers are up, they should be Online (monitor servers status in crm_mon) after some time. Services should migrate between them without problems.

Pacemaker with DRBD

1. Complete test with standalone Pacemaker

2. [ALL] Install DRBD and other needed tools

sudo apt-get install linux-headers-server psmisc
sudo apt-get install drbd8-utils

Since we will be using pacemaker for stoping and starting of drbd, remove it from runlevels:

sudo update-rc.d -f drbd remove

3. [ALL] Set up DRBD

Create /etc/drbd.d/disk0.res file, containing:

resource disk0 {
        protocol C;
        net {
                cram-hmac-alg sha1;
                shared-secret "lucid";
        }
        on lucidclusterX {
                device /dev/drbd0;
                disk /dev/sdXY;
                address X.X.X.X:7788;
                meta-disk internal;
        }
        on lucidclusterY {
                device /dev/drbd0;
                disk /dev/sdXY;
                address X.X.X.Y:7788;
                meta-disk internal;
        }
}

Make sure to replace lucidclusterX|Y with real hostnames of your two servers. Change X.X.X.X and X.X.X.Y to real IPs of those servers and sdXY to real partitions that will be used for drbd.

Once you saved that file, create resource:

sudo drbdadm create-md disk0

You should get:

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Finally, start drbd:

sudo /etc/init.d/drbd start

sudo drbdadm status should return:

<resource minor="0" name="disk0" cs="Connected" ro1="Secondary" ro2="Secondary" ds1="Inconsistent" ds2="Inconsistent" />

4. [ONE] Create filesystem

One of your servers will act as primary server for start. You'll use it to create filesystem and force the other cluster to sync from it. On chosen server force it to be primary and create filesystem:

sudo drbdadm -- --overwrite-data-of-peer primary disk0
sudo mkfs.ext3 /dev/drbd/by-res/disk0

5. [ONE] DRBD+Pacemaker

Edit pacemaker configuration:

crm configure edit

and add:

primitive drbd_disk ocf:linbit:drbd \
        params drbd_resource="disk0" \
        op monitor interval="15s"
primitive fs_drbd ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/disk0" directory="/mnt" fstype="ext3"
ms ms_drbd drbd_disk \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location loc-1 fs_drbd -inf: lucidclusterX
location loc-2 drbd_disk -inf: lucidclusterX
colocation mnt_on_master inf: fs_drbd ms_drbd:Master
order mount_after_drbd inf: ms_drbd:promote fs_drbd:start

Replace lucidclusterX with hostname of node that doesn't have drbd. Save and fire up crm_mon. You should get something like this:

============
Last updated: Wed Jan 13 18:03:12 2010
Stack: openais
Current DC: lucidcluster2 - partition with quorum
Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
3 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ lucidcluster2 lucidcluster3 lucidcluster1 ]

 Resource Group: group1
     ip1        (ocf::heartbeat:IPaddr2):       Started lucidcluster2
     apache2    (lsb:apache2):  Started lucidcluster2
 Resource Group: group2
     ip2        (ocf::heartbeat:IPaddr2):       Started lucidcluster3
     vsftpd     (lsb:vsftpd):   Started lucidcluster3
 Master/Slave Set: ms_drbd
     Masters: [ lucidcluster2 ]
     Slaves: [ lucidcluster1 ]
fs_drbd (ocf::heartbeat:Filesystem):    Started lucidcluster2

6. [ALL] Testing

Wait for drbd disks to get synced and start rebooting/killing your nodes.

Test results

Name

Test

Passed/Failed

Comments

Questions

ClusterStack/LucidTesting (last edited 2012-02-15 17:31:40 by soho85-138)

-  ⇤ ← Revision 2 as of 2010-01-13 16:23:51 → 
  Size: 4409
  Editor: backup
  Comment:
+   ← Revision 3 as of 2010-01-13 18:21:05 → ⇥
  Size: 8192
  Editor: backup
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
+'''Contents'''
<<TableOfContents(2)>>
-Line 102:
+Line 105:
+== Pacemaker with DRBD ==

=== 1. Complete test with standalone Pacemaker ===

=== 2. [ALL] Install DRBD and other needed tools ===
{{{
sudo apt-get install linux-headers-server psmisc
sudo apt-get install drbd8-utils
}}}
Since we will be using pacemaker for stoping and starting of drbd, remove it from runlevels:
{{{
sudo update-rc.d -f drbd remove
}}}
=== 3. [ALL] Set up DRBD ===
Create /etc/drbd.d/disk0.res file, containing:
{{{
resource disk0 {
        protocol C;
        net {
                cram-hmac-alg sha1;
                shared-secret "lucid";
        }
        on lucidclusterX {
                device /dev/drbd0;
                disk /dev/sdXY;
                address X.X.X.X:7788;
                meta-disk internal;
        }
        on lucidclusterY {
                device /dev/drbd0;
                disk /dev/sdXY;
                address X.X.X.Y:7788;
                meta-disk internal;
        }
}
}}}
Make sure to replace lucidclusterX|Y with real hostnames of your two servers. Change X.X.X.X and X.X.X.Y to real IPs of those servers and sdXY to real partitions that will be used for drbd.

Once you saved that file, create resource:
{{{
sudo drbdadm create-md disk0
}}}
You should get:
{{{
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
}}}
Finally, start drbd:
{{{
sudo /etc/init.d/drbd start
}}}
sudo drbdadm status should return:
{{{
<resource minor="0" name="disk0" cs="Connected" ro1="Secondary" ro2="Secondary" ds1="Inconsistent" ds2="Inconsistent" />
}}}
=== 4. [ONE] Create filesystem ===
One of your servers will act as primary server for start. You'll use it to create filesystem and force the other cluster to sync from it. On chosen server force it to be primary and create filesystem:
{{{
sudo drbdadm -- --overwrite-data-of-peer primary disk0
sudo mkfs.ext3 /dev/drbd/by-res/disk0
}}}

=== 5. [ONE] DRBD+Pacemaker ===
Edit pacemaker configuration:
{{{
crm configure edit
}}}
and add:
{{{
primitive drbd_disk ocf:linbit:drbd \
 params drbd_resource="disk0" \
 op monitor interval="15s"
primitive fs_drbd ocf:heartbeat:Filesystem \
 params device="/dev/drbd/by-res/disk0" directory="/mnt" fstype="ext3"
ms ms_drbd drbd_disk \
 meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location loc-1 fs_drbd -inf: lucidclusterX
location loc-2 drbd_disk -inf: lucidclusterX
colocation mnt_on_master inf: fs_drbd ms_drbd:Master
order mount_after_drbd inf: ms_drbd:promote fs_drbd:start
}}}
Replace lucidclusterX with hostname of node that doesn't have drbd. Save and fire up crm_mon. You should get something like this:
{{{
============
Last updated: Wed Jan 13 18:03:12 2010
Stack: openais
Current DC: lucidcluster2 - partition with quorum
Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
3 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ lucidcluster2 lucidcluster3 lucidcluster1 ]

 Resource Group: group1
     ip1        (ocf::heartbeat:IPaddr2):       Started lucidcluster2
     apache2    (lsb:apache2):  Started lucidcluster2
 Resource Group: group2
     ip2        (ocf::heartbeat:IPaddr2):       Started lucidcluster3
     vsftpd     (lsb:vsftpd):   Started lucidcluster3
 Master/Slave Set: ms_drbd
     Masters: [ lucidcluster2 ]
     Slaves: [ lucidcluster1 ]
fs_drbd (ocf::heartbeat:Filesystem):    Started lucidcluster2
}}}

=== 6. [ALL] Testing ===
Wait for drbd disks to get synced and start rebooting/killing your nodes.

== Test results ==
||'''Name'''||''Test''||''Passed/Failed''||''Comments''||
== Questions ==

Ubuntu Wiki