Summary

Apache Hadoop stack improvements for the Maverick cycle.

Release Note

Ubuntu 10.10.10 comes with support for the Hadoop family of projects: hadoop, hbase and pig are now available through the Ubuntu archives.

Rationale

The Apache Hadoop project family proposes reliable, scalable, distributed computing that is suitable for cloud workloads. As the distribution of choice for cloud environments, Ubuntu Server edition needs to support this stack.

User stories

As a systems developer, I want to deploy a complete Hadoop infrastructure. I use the packaging available in 10.10.10 and everything can be installed easily.

As an hadoop user, I want to produce sequences of Map-Reduce programs. I install the pig package and am able to compile such programs.

Assumptions

None.

Design

Scope

The main contenders are:

Also part of this spec, though it is now another top-level Apache project:

Other Hadoop subprojects are maturing and should be considered for future releases:

Current situation

Which upstream ?

The Apache Hadoop stable distribution (0.20) has a few noticeable shortcomings and doesn't play nice with HBase. The 0.21 codebase, that is supposed to fixed those issues, is under development and highly unstable right now. That's the reason why Yahoo (large Hadoop user) and Cloudera (Hadoop solutions provider) both maintain their own distributions of a 0.20 patched codebase.

In coordination with Debian, we need to evaluate all potential codebases and pick one that is both maintainable and usable. From early contacts, it appears that most of the Cloudera patchset is made of Apache JIRA fixes backports, which might make it a sustainable alternative over the long run.

Hadoop

HBase

Zookeeper

Pig

Proposed objectives

Implementation

See work items on server-maverick-hadoop-pig whiteboard.

Test/Demo Plan

tbd

Unresolved issues

Codebase selection, which impacts the quantity of work to be done on that stack.

BoF agenda and discussion

UDS discussion notes

Worse case scenario: have everything available in multiverse.

Test if hadoop works with openjdk.


CategorySpec

HadoopPigSpec (last edited 2010-06-09 22:42:04 by dsl-173-206-3-81)