HadoopSpec

Differences between revisions 7 and 8
Revision 7 as of 2011-10-24 20:55:14
Size: 4055
Editor: james-page
Comment:
Revision 8 as of 2012-01-13 11:45:37
Size: 4269
Editor: james-page
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
 * '''Launchpad Entry''': UbuntuSpec:servercloud-p-hadoop  * '''Launchpad Entry''': UbuntuSpec:servercloud-p-hdp-hadoop
Line 12: Line 12:
Ubuntu now features Apache Hadoop - a software platform that lets one easily write and run applications that process vast amounts of data Ubuntu now features Apache Hadoop - a software platform that lets one easily write and run applications that process vast amounts of data.
Line 24: Line 24:
Marcos has been tasked with setting up a new Hadoop compute cluster to support log analysis of this companies website activity; He is able to quickly and easily deploy Apache Hadoop on his companies internal Cloud using Juju with packages that come directly from the Ubuntu archives (as mandated by his companies IT policy). Marcos has been tasked with setting up a new Hadoop compute cluster to support log analysis of this companies website activity; He is able to quickly and easily deploy Apache Hadoop on his companies internal Cloud using Juju with packages that come directly from the Canonical partner archive (as mandated by his companies IT policy).
Line 26: Line 26:
Natalie wants to setup a Hadoop cluster; she's not able to use Juju and a public cloud due to the existing infrastructure policies and design in place at her organisation but she is able to deploy the required packages directly from the Ubuntu archive. Natalie wants to setup a Hadoop cluster; she's not able to use Juju and a public cloud due to the existing infrastructure policies and design in place at her organisation but she is able to deploy the required packages directly from the Canonical partner archive.
Line 31: Line 31:
 * Projects that will be packaged: hadoop, hbase, hive, hcatalog, pig, zookeeper.
Line 34: Line 35:
 * Native integrations must be part of the packaging.
 * Packages will target the Canonical Partner archive for this release.

== Obsolete Assumptions ==
Line 39: Line 45:
 * Native integrations must be part of the packaging.
  • Launchpad Entry: servercloud-p-hdp-hadoop

  • Created: 2011-10-24

  • Contributors: James Page

  • Packages affected: New Packaging

Summary

Release Note

Ubuntu now features Apache Hadoop - a software platform that lets one easily write and run applications that process vast amounts of data.

Rationale

Apache Hadoop has gained widespread adoption; the various flavours of Hadoop appear to be consolidating and Cloudera have transferred a number of their Hadoop related projects to Apache including Bigtop (the Cloudera packaging for Redhat, Debian and SuSE).

Packaging Hadoop for Ubuntu would help support developing a set of rock solid Juju charms for Hadoop by providing a well integrated version of the packaging for Ubuntu.

Collaboration with Apache Bigtop would also potentially help support packaging the wider family of Hadoop related projects.

User stories

Marcos has been tasked with setting up a new Hadoop compute cluster to support log analysis of this companies website activity; He is able to quickly and easily deploy Apache Hadoop on his companies internal Cloud using Juju with packages that come directly from the Canonical partner archive (as mandated by his companies IT policy).

Natalie wants to setup a Hadoop cluster; she's not able to use Juju and a public cloud due to the existing infrastructure policies and design in place at her organisation but she is able to deploy the required packages directly from the Canonical partner archive.

Assumptions

  • Ubuntu will package Apache Hadoop (rather than one of the various variants).
  • Projects that will be packaged: hadoop, hbase, hive, hcatalog, pig, zookeeper.
  • Packaging will align to Apache Bigtop (based on the most popular upstream packaging)
  • Packaging will focus on the most recent stable release of Hadoop.
  • Configuration methods should take into account integration with configuration management tools such as Puppet and Chef
  • Native integrations must be part of the packaging.
  • Packages will target the Canonical Partner archive for this release.

Obsolete Assumptions

  • The majority of Java dependencies can be fulfilled through what is already in the archive (see hadoop-dependency-report.tar.gz)

    • kfs - this can be excluded to disable this feature but does not look like that much work to package.
    • Apache ftpserver would be required to enable smoke testing - again looks OK to package.
  • Focus will be on a solid Hadoop core with contrib packages if time permits.
    • Most dependencies are already in the archive apart from thrift.
  • Packages will target universe for this release.

Design

You can have subsections that better describe specific parts of the issue.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

ServerTeam/Specs/HadoopSpec (last edited 2012-01-13 11:45:37 by james-page)