StephanPeijnik

StephanPeijnik

Contact information

Project

  • Project Name: Network Wide Updates
  • Project Description:
    • Network Wide Updates (nwu) is about distributing packages on a local network in a more efficient way than having all clients download their updates from Ubuntu mirros directly. The idea has been worked on in the past already, with little outcome though. The old project is described at https://wiki.ubuntu.com/NetworkWideUpdates. The main idea described at the wiki page above forms the base of this project, but everything related to remote configuration, remote package installation and such is not scope of the base of this project. The first priority of this project is package distribution.

      There are three possible ways nwu could work: a Client-Server architecture, a Peer2Peer one or a hybrid of those two. The Client-Server approach can be viewed as the traditional way of distributing updates in a local network. One (or more) central server acts as package information and package caches. These servers refresh the package information (download Packages and Sources files) via the network at a given interval and issue security update alerts to all clients on the network. Additionally an nwu server acts as a local cache, similar to apt-proxy.

      The Peer2Peer approach on the other hand works by turning each and every Ubuntu system on the local network into a cache similar to the Client-Server approach. This means that every system running nwu in Peer2Peer mode announces its existence to the whole network. In turn other nodes may connect and ask the node to provide updated package information or packages (ie. allowing access to files in /var/cache/apt/archives and /var/lib/apt/lists/).

      A hybrid mode is also possible, where one nwu server exists on the network and several peer2peer nodes. The nwu server does the initial download, but once one peer2peer node receives a file another node needs those can be exchanged directly, rather than using the server. This should also help reduce the load on the nwu server.<br /> All three methods should work without further configuration of client systems, apart from installing the nwu support packages. The main benefit of nwu is decreased bandwidth usage on internet-facing network connections. This means that instead of 50 clients downloading a security update from the same http server at the same time one client (or an nwu server) downloads the file once and it then gets distributed on the local network directly. The main task of this project is implementing the software needed for the three modes of operation above. If things go well I could imagine extending the scope of the project with extending the nwu server to allow remote package installation (along with forced updating of clients). The programming language chosen for this project is Python where possible and C/C++ if needed somewhere.

      • USE CASES

        • Alice has two Ubuntu systems on her home network. Both need to download a security update. System A first downloads the file using the fall-back mechanism in nwu (direct download). System B then wants to download the file too. It tries finding the file via nwu and notices System A already has the file on disk. System B now downloads the file directly from System A.
        • Bob is an administrator at a company running some 50 Ubuntu system. He installs an nwu server and installs the nwu client on all systems. A new security update is issued through the Ubuntu update channels. The nwu server refreshes its package lists periodically and notifies all clients it has finished downloading a new package list. System A now tries downloading the package. The nwu server tells the system it has the file on disk (even though it has not), starts downloading the file and caches it on disk. System A now downloads the update from the nwu server. Systems B to Z also want to install the file, contact the nwu server and are told the file is on disk. They now download the file from the nwu server, without generating any outbound traffic.
  • If you would be willing and able to do other projects instead, which ones?
    • (In order of preference)
      • "Implement an Archive Crawler"
      • "Apt multiarch"
  • Why did you like this idea?
    • Because I have already worked on NWU in the past, namely the now defunct project that was led by Yves Junqueira. Additionally I have a use-case for this project myself, running bleeding edge Ubuntu installations on various systems that need to be updated daily.
  • Please describe a tentative project architecture or an approach to it:
    • Every nwu-enabled client is running an additional process, the nwu broker. This process is responsible for requesting files from neighbour nwu brokers or a stand-alone nwu server. The idea here is to make the nwu broker act as a proxy for apt, so no changes to apt itself are needed for this approach to work. The nwu broker tries to find other nwu brokers on the local network (through multicast, maybe through avahi) and connect to those. Once apt tries to download a file the broker asks all neighbour brokers whether they have the file. If they do one neighbour will be selected and the file be downloaded from there. However, if no neighbour has the requested file there will be a fall-back to the conventional way of directly downloading the file. Brokers use /var/lib/apt/lists/ and /var/cache/apt/archives as their package caches. Looking at a nwu server the server acts just like a broker, but always tries to provide a given file, even though if it has to download the file itself, in turn downloading the file and caching it on disk. Additionally the server can download package lists itself, process them and notify clients once a new list is available. The nwu server uses a separate location from /var/lib/apt/lists/ and /var/cache/apt/archives as its package cache.

      Discovering neighbor NWU brokers and/or servers works by using Avahi. The service name should be "nwu", meaning finding NWU nodes simply involves looking up _nwu._tcp.<domain> via Avahi. In the TXT fields allowed through Avahi the following information needs to be included: architecture (ie. amd64, i386, all, ...), release codename (lucid, karmic, ...), a possible rate-limit in KB/s and whether the node is an NWU server or a peer2peer node. This allows a client to build up a list of prospective neighbors.

    • Neighbor selection should work by using an algorithm taking into account the node type (server/peer2peer), the distance from a network point of view (X hops to target) and a possible rate-limit imposed by the node. So the best possible node would be a server, being on the same subnet (no hops) and not having any sort of rate-limiting, followed by a peer2peer node, being on the same subnet without rate-limiting and so forth. Additionally, discovering which node has an interesting file should work via a special-purpose UDP multicast protocol. This protocol should just feature a few simple messages, like "Who has file X" or "Who has packages file X from repository Y that is newer than Z". If the repository is a "generic" one (ie. Ubuntu repository), instead of the actual URL to the repository an identifier may be used (possibly in the form ubuntu:lucid or similar). This is needed so sharing package files also works for clients which are using different Ubuntu mirrors. As for the actual file transfer the protocol to be used has not been determined yet. A possible solution would be implementing minimalistic HTTP servers in all brokers or clients. Another approach would be designing a minimal TCP-based file transfer protocol, with a binary representation, to save even more bandwidth and to make parsing requests easier/less CPU consuming. Typical message exchange over the network:
      • Avahi lookup of possible neighbours.
      • Multicast message asking for a file/packages files
      • All nodes answer telling the nwu broker whether they have the file or not.
      • Download from the nearest neighbor that gave a positive answer or direct download in case the file is not available from neighbors.
      Having a quick glance at the nwu server there are a few possible modes of operation, like:
      • Cache-only mode
        • What you would expect of apt-proxy plus auto-discovery.
      • Mirror mode
        • A complete mirror of the archive or a subset thereof.
      • Intelligent Mirror mode
        • This is probably the most advanced mode and seems not to have been implemented anywhere yet. The nwu server would be given a list of packages it should always keep up to date, so they can be served immediately. The server would download all packages and their dependencies then. Using this technique with the ubuntu-minimal package for example would allow only the minimal system to always be present and up-to-date on the nwu server.
      The latter two modes might need to be added to both the neighbor selection algorithm and the Avahi TXT records, once implemented.
  • Give us details about the milestones for this project
    • Milestone0: Evaluate requirements; Specify the protocol; Checking what needs to be done for integration into apt.
      • This milestone consists of finding the exact requirements, specifying a protocol to be used by nwu and checking whether avahi can be used for announcing nodes on the network or a native announcing protocol needs to be created. Additionally apt needs to be checked for how nwu could be integrated into it, either as a separate protocol, as a proxy, etc.
    • Milestone1: Working peer2peer infrastructure.
      • This covers creating all that is needed for the peer2peer infrastructure. This consists of implementing the protocol as specified in Milestone0, writing the peer2peer node program and integrating everything into apt (see above).
    • Milestone2: Working nwu server.
      • After getting a working peer2peer infrastructure the next logical step is creating the stand-alone nwu server. This server should consist of a package cache and a web-frontend for accessing statistics (and configuration tasks later on).
    • (Milestone3: remote updates through server)
      • If time permits the third milestone should integrate everything that is needed to force updates onto clients into both the server and the client. This means that the server needs to be extended to be aware of all packages installed on each and every client.
    • (Milestone4: remote package installation)
      • Again, if time permits this milestone should cover forced package installation on a client through the server. This could be used for example in big networks allowing the admin to install an application user X wants on system Y by using the nwu server web interface. A requirement for this is the ability of the server to track installed packages on a client, as described in Milestone3.
  • Why will your proposal benefit Ubuntu?
    • Mainly nwu should benefit Ubuntu users in helping save bandwidth. However, there is a lot of potential benefit for network administrators too, allowing them to keep Ubuntu systems from downloading packages directly from the internet and keeping a lot of traffic local. Ubuntu itself would benefit by getting an enterprise feature that is already available for other operating systems (think Microsoft Windows Server Update Services here).

Open Source

  • Please describe any previous Open Source development experience
    • I have been involved with Free Software for about 9 years now. My main involvement besides writing some more or less minor pieces of software were helping the GNU Savannah administration team, packaging Python modules for Debian within the Debian Python Modules Team and working on update-manager for Debian during last year's Summer Of Code.
  • Why are you interested in Open Source?
    • My main interest can probably be attributed to being a tinkerer. I always liked taking things apart and trying to understand underlying concepts, modifying them and watching the outcome. Furthermore I am a heavy Free Software user, both on the Desktop and on Servers and believe in the principle of sharing code and working together.

Availability

  • How long will the project take? When can you begin?
    • I am not entirely sure about how long the project will take. Two to three months sounds like a good guess though. I could begin immediately, working in my spare time until the end of June, when the semester ends.
  • How much time do you expect to dedicate to this project? (weekly)
    • 10-20 hours during university time (until the end of June) and 40+ hours afterwards.
  • Where will you based during the summer?
    • I will be based at my home town, Villach, Austria in the summer.
  • Do you have any commitments for the summer? (holidays/work/summer courses)
    • I would like to go on a trip for about a week sometime in either July or August, but I have no exact plans on that yet. Apart from that I do not have any commitments.
  • Please designate a back up student (in case you need to withdraw your application)

Other

  • Have you ever participated in a previous GSoC? (describe your project)
    • I participated in 2009, working on creating a distribution-independent version of update-manager for the Debian project. My work then was splitting update-manager into three parts: a frontend (the UI), a backend (python-apt) and a distribution specific module. Additionally I replaced the then used synaptic backend with one using python-apt directly. My mentor was Michael Vogt.
  • Have you applied for any other 2010 Summer of Code projects? If yes, which ones?
    • No, none.
  • Why did you apply for the Google Summer of Code ?
    • Because I enjoyed working on Free Software during last year's Summer Of Code and really liked being able to plan my work week myself, instead of having strict office hours, so I could work when I was productive.
  • Why did you choose Ubuntu as a mentoring organisation?
    • Ubuntu came as a logical choice because I am an Ubuntu user and like the way Ubuntu has helped GNU/Linux to become not only more user-friendly, but more widely adopted.
  • Why do you want to participate and why should Ubuntu choose you?
    • Because I would like to work on Free Software during the summer and help make Ubuntu better this way. I guess I should be chosen because of my skills, my experience with Debian packaging and its internals and my previous successful work during Summer Of Code.

GSoC/2010/StephanPeijnik (last edited 2010-04-16 12:25:22 by 62-47-23-35)