Hadoop

Overview

During the 12.04 cycle hadoop and a selection of components will be packaged to support Hadoop charming and will be avaliable in PPA's:

  • hadoop 1.0.1
  • hive 0.8.1
  • pig 0.9.2
  • hbase 0.92.0 (maybe 0.92.1)
  • zookeeper 3.4.3

This components are secondary and may be packaged:

  • hcatalog 0.2.0

PPA's

A team (http://launchpad.net/~hadoop-ubuntu) and three PPA's have been setup on launchpad; all three PPA's have been enabled for armel, armhf and powerpc so please check before adding anyone else to the team.

dev

ppa:hadoop-ubuntu/dev

Development versions of packages; new versions prior to full testing in testing

testing

ppa:hadoop-ubuntu/testing

Versions of packages which are ready for testing.

stable

ppa:hadoop-ubuntu/stable

Stable versions of packages which have been tested.

Packaging Details

Initial Packaging

Packages should be built based on the source packages built by bigtop (http://incubator.apache.org/bigtop http://github.com/apache/bigtop). However most packages are based on older debhelper so d/rules etc should be rationalised to use new features:

git clone https://github.com/apache/bigtop
mkdir -p hadoop/debian
cp bigtop/bigtop-packages/src/deb/hadoop/* hadoop/debian
cp bigtop/bigtop-packages/src/common/hadoop/* hadoop/debian

Packages should ship upstart configurations; see hadoop for examples.

Upstream Source

Packages should be based on upstream binary distributions; Java components should not be rebuild but native components will need to be rebuild with appropriate patches for precise + official ports.

Packages should define a target in debian/rules called get-orig-source which pulls the correct version of the upstream distribution from an appropriate upstream source. Ideally checksums should be validated.

Version Control

Packaging only branches (just the debian folder) should be created for all packages and stored under the hadoop-ubuntu team, e.g. lp:~hadoop-ubuntu/ubuntu/precise/hadoop/trunk

Packages should be of source/format - 3.0 (quilt) and should have packaging version numbers i.e. 0.20.205.0-0ubuntu1~hadoopX. A ~hadoopX suffix should be used to support multiple upload iterations to the PPA's.

Building Packages Locally

Packages should be builable using the following procedure:

bzr branch lp:~hadoop-ubuntu/ubuntu/precise/hadoop/trunk hadoop
cd hadoop
./debian/rules get-orig-source
bzr bd -S
cd ..
sbuild -A -d precise hadoop*.dsc

Uploading Packages to PPA's

Once an initial orig.tar.gz has been uploaded to PPA subsequent packaging updates can be uploaded using:

cd hadoop
bzr bd -S -- -sd
cd ..
dput ppa:hadoop-ubuntu/dev hadoop*_source.changes

Finding JAVA_HOME

Hadoop and friends need reliable JAVA_HOME detection and setup; this is provided by the bigtop-utils package (lp:~hadoop-ubuntu/+junk/bigtop-utils).

. /usr/lib/bigtop-utils/bigtop-detect-javahome

Most of the bigtop packages use this in the provided init scripts. See hadoop for examples.

ServerTeam/Hadoop (last edited 2012-03-09 17:45:28 by james-page)