UrbanSkudnik

Urban Skudnik

Contact information

Project

  • Project Name: Home user backup solution (deja-dup improvements (Was "Home User Backup Spec"))
  • Project Description:

The fundamental idea of the project is that backup should be introduced to home users that still (at least in part) believe that backup and its maintenance are rocket science. Many attempts have been started and many solutions are already available, but Ubuntu has yet to implement backup solution that would be simple enough for home users that do not have time nor experience with maintenance of Linux operating system (or any operating system for that matter) and at the same time be properly integrated with Ubuntu.

Important factor in the project will be do usage of existing tools since a solution that would be built completely from scratch would take on the order of magnitude of man-years to achieve properly reliability (which is by far the most important factor of a backup). I had chosen to use Deja Dup.

For a proper solution, I will have to work on Deja Dup itself (for example, it still does not support restoring of a single file, it does not check whether or not the disk has space for backup, etc.), with Ubuntu's migration assistant (user will have to be presented with an option to create backup at the next Ubuntu upgrade that will support mu backup solution) and since it will have to be integrated with Ubuntu's GUI I will have to work with Gnome and its file manager (Nautilus).

Deja Dup seems to be particularly well suited for my project since it supports backup to external hard drives and to Amazon S3/Ubuntu One. The backup solution should also support quick restore of system settings so that if user looses his hard drive he can be up to speed as quickly as possible and restore his data later on.

  • If you would be willing and able to do other projects instead, which ones?

I haven't done proper research and discuss other ideas with mentors, but my other alternative was "Testdrive Front End" which seems to be taken already. A similar project to which at least a part of my ideas could be applied is "Ubuntu System Restore and Backup utility".

  • Why did you like this idea?

At today's prices of hard drives and prices of storage in the cloud it is unbelievable that there are still people that have neither. At least a part of that problem lies in the misconception that backup is hard to make and maintain. Apple introduced idea quite effectively to its users and I believe that same can be achieved with such user-oriented Linux distribution as Ubuntu.

  • Please describe a tentative project architecture or an approach to it:

Primary goals for home user backup solution are

  1. Simple to use
  2. High automation (to depend on users input as little as possible)
  3. Integration

Priority of components

  1. Deja Dup improvements
  2. Nautilus extension
  3. Indicator Applet for Deja Dup If time remains:
  4. First-boot script to suggest user to configure backup

General user experience

Since the focus of the project are home users it should be noted that simplicity should be kept in my the whole time.

As such, backup should not be done just by a program located in some sub-menu of Application menu but should be presented to user from day one - from installation/upgrade - and from that day on with an Indicator Applet. This will of course mean big extensions to Deja Dup itself and will mean that the project will require work with several other programs/components.

Installation/Upgrade

After fresh installation user should be first given an option of restoring from an existing backup. If he selects this option, he must plug in his external hard drive or give credentials of his Ubuntu One/Amazon S3 account.

If user does not want to restore he is presented with backup options. As such, he is reminded that:

  1. Ubuntu now gives its users 2 GB of storage for free in the cloud and that this space can be upgraded to 50 GB
  2. Backup can be made to Amazon S3
  3. If they don't have an external hard drive yet it is highly recommended that they purchase it
  4. Possibly other options that Deja Dup supports

At the same time user should probably be specifically reminded that those 2 GB should be used at least for the backup of settings and documents (since those are, for most users, by far the most important and are not replicable by download). If user would then decide to go along with recommendations, he should either enter Ubuntu One's credentials or be given an option of creating a new account at which point initial backup of home directory should be done.

I have yet to decide and it is possibly open to debate whether or not the complete configuration of backup should be done with installation wizard (Ubiquity) or later when user logs in for the first time. Either way, user should configure his initial backup directories and what he wants to backup - for example, it would probably be best to select home directory as source directory but it could be that Downloads and Videos directories are excluded if user is backing up to remote location (in case of external hard drives even those files are selected since they are quick to back up).

An argument presented to me by Evan Dandrea is that by his experiences with Ubiquity and migration-assistant (on which he worked at GSOC 2006) coding for Ubiquity can be quite tricky since it is filled with edge cases (e.g. how will user connect to internet? Are disks properly formatted?) and that its code base is quite difficult to understand and work with in the time span allocated to GSOC and would consume disproportionally large amount of time and resources.

Following his advice I am a lot more keen that if any time will be left for actual integration with installation process, backup configuration and possible restore should be done after first boot and, if possible, after network has been configured already.

Deja Dup and configuration of backup

When configured, Deja Dup would be used to do backups on regular basis. I would change the default setting of schedule backup to hourly basis for the last few days and have those backups as incremental updates, so that user could see the history of a specific file.

Deja Dup would also regularly notify the user about the status of backup (time of last backed up, failure, possibly actions that were taken on file system - especially if any file was removed - with an option of restoring a file, etc.) through Indicator Applet. Statuses should be reported on source-per-source basis (separate list of data for external hard drive, Ubuntu One, etc.)

One aspect of Deja Dup that needs addressing is backing up to multiple source and selection of these sources. For this Deja Dup's interface will have to be changed at least to some extent since going to Preferences to change backup destination is far too awkward. Not just that - Deja Dup should be changed so that if user configures it so, it backs up settings and documents (and certain other smaller files if they only have 2 GB in cloud) to Ubuntu One even if user already has external hard drive. Usefulness of this is obvious - if user has his computer and external hard drive at the same location and some kind of disaster or theft happens his data is still available.

In case of Ubuntu One it might be interesting for users if we would check what kind of plan user has and suggest backup configuration according to this automatically. Since file-based and not just directory-based backup will be implemented anyway we could scan his home directory and add directories on the following priorities if previously selected directories/files haven't exceeded the limit (especially important in case of 2GB package) - Settings, Documents, Pictures, Videos, Downloads.

The easiest solution for all of this is that backup is always done to all available sources that were configured which should also be the default choice. Since option of "Restore" that is available on default view is used to do a full restore, it should be renamed to "Full restore". "Backup", on the other hand, should stay as it is. It should, however, do backup to all configured sources.

If, on the other hand, user specifically sets that backup should be done to single source (or smaller set of sources) only, another control should be shown in the main window. The best would be Drop Down List. Since picture is worth more than thousand words I am attaching a PDF document with wireframes.

Wire-frame document also includes a demo screen of preferences as this would now no longer need drop down list for sources.

This list should show all selected sources and when user would connect a new device to which we already did a backup in the past Deja Dup should ask him whether or not it would like to perform a backup to this device (and add it to the list). An option should also be given on the box if the user would like that the choice is remembered. If user confirms backup to this source, the source should now also display in the bottom.

On the source profile window one could also set how often backup to this location should be performed. This and previous changes effectively solve the problem of profiles without adding unnecessary complexity.

Another improvement that could be implemented is the selection of directory that should be restored, although this might be redundant if my improvements to Nautilus are implemented and therefore unnecessary. It however might be necessary to develop this feature if it is realized that users replaced Nautilus with some other file browser though it could be argued that in such a case home backup solution is also probably inadequate for them.

Nautilus improvements

Since Deja Dup is a separate program that will run in the background (or its jobs), working with files and directories is more naturally located in the domain of file browser. It should be implemented as an extension to Nautilus that would add additional items to pop-up menu when user clicks with his right mouse button and possibly by another option in menu bar under View menu (as in "Show deleted files"). The same or another extension will allow the user to go through the history of the directory with a slider, although the same action could be achieved by adding another option in Side Pain with all revisions of backup listed as a list.

Mouse control

Mouse control is faster than opening Side Pane, but can prove clumsy after displaying more than around a dozen changes and should therefore also have a button for opening Side Pane or showing slider.

In case of a file, user should see history of a file (e.g. "Modified 2 hours ago, Modified two days ago, Modified on 15. October 2009) with an option of restoring to that version and seeing a difference with that version.

In case of directory the user should see the same list although with a different values (e.g. "File 'abc' deleted 2 days ago" with an option to restore that file).

If user would click with right mouse button in a directory on an empty space it should also show the history of the directory.

Slider/Side Pane

Another option that should be used for looking back more thoroughly through the history of directory is using a slider with fixed points at the top of Main View - similarly, like Trash show "Empty Trash" at the top of Main Window - with distinct color change.

The benefit of a slider is that is is faster to navigate with it, but it can not show the date until user changes its value. Side Pane on the other hand can show all the data at the same time, but that could prove overwhelming for home users (especially if this list is going back for months with lot of changes).

The slider concept has already been tested by Apple with its Time Machine and it feels quite natural to use it. Whether or not we should make it as sexy as Apple is a matter of discussion - when Apples Time Machine opens, it brings a quite distinct look which is not consistent with its interface. This has both its benefits and weaknesses. On one hand, it is clear that user is doing something rather special (which backup recovery normally is), but the inconsistency could confuse some users until they understand the concept.


Technical implementation

For a start I implemented a help page for /tests/interactive (my first branch: to help myself and other developers with the prevailing tool used for development of Deja Dup since it automatically uses binaries compiled by make and it enables the testing of new code without overriding default system settings for Deja Dup. From this comes also the conclusion that all development will be going on on Launchpad with Bazaar as a version control system.

As for work required by Deja Dup and it's bugs - majority of my suggested changes are related to Deja Dup itself and since most of the code for it is written in Vala I plan to use Vala extensively. For the suggested changes of GUI I plan to use Glade to minimize time consumed by writing GUI code.

I would not go into too much details regarding specific technical implementation since how things should perform was discussed in previous section and describing algorithms in pseudo-Vala would mean that I would practically implement most of the required work already.

After getting familiar with things that Vala brings to table most changes are reduced to work regarding GUI (for which Glade with its nice documentation should be used) and Vala/C code.


Check disk space: Since Deja Dup should verify that we have enough free space available, one possible (quick) solution would go in this order:

  1. Check the free space of the selected partition with either system() and running df or with statvfs()
  2. Run the same backup action as is planned but with --dry-run option
  3. Get size of files that are to be backed up
  4. Compare the two numbers and warn if necessary (it would probably be smart to also warn if disk is about to start running low)

Add Amazon S3 support to GVFS: I am not sure whether or not this should be implemented since it has been suggested already that this should be moved upstream.

Removable disk experience: Neglecting changes required to GUI the problem of checking mount status is reduced to trying to mount the device and checking the status that mount returns or checking /etc/mtab for our device. Checking when the disk is connected can be done with udev and its rules - setting a rule when a USB disk is connected and running our script that would start Deja Dup with a special parameter and would offer to configure the disk for the user.

Slow backup speed & unprotected password in-memory This problems seem to be hard to reproduce or unrelated to Deja Dup specifically so it is open for further discussion if they should be worked on during GSOC.

Provide a way to use public key encryption This is mainly the question of presenting a proper UI to the user and making sure proper keys are used. It might be worth thinking of integrating this into the general Deja Dup interface (and possibly moving it to preferences instead of just a password box).


I plan to do integration with Nautilus with Python if my mentor will accept such a solution and with C otherwise - this option is possibly more attractive because I saw in the repository that some code for Nautilus integration has already been written, but it is hasn't been written in such quantities that migration to Python would take considerable amounts of time. It is therefore a matter of discussion with Michael Terry what are his opinions on the status of that code.

If I will have enough time to also improve installation/upgrade experience, work with Ubiquity will also be done with Python since it is already written mostly in it. It was argued however that this solution can be very time consuming and therefore a backup configuration wizard that is ran at the first boot is a far likelier option - in such a case I will again most probably go with either Vala (to try to maintain consistent language base across the project) or with Python.

Note on my knowledge of Vala

Personally I had not worked with it yet but I had touched C# in the past. My rough estimate of my knowledge would be that I can read it and find my way around and with a bit of documentation can also write it. At the very start I would probably not be the fastest hacker on the planet for this but I am confident that I can get up to speed with Vala/C# before the GSOC work begins and be ready to hit the ground running.

My confidence comes from the fact that I have a long history of working with various programming languages. Most of my professional development was done with Python and Javascript but I also regularly work with Mathematica for my academic needs of processing data/numerical analysis. In the past I worked with PHP, played with Lisp, worked with C and Objective-C and, as said, shortly played with C#.

  • Give us details about the milestones for this project

Even before starting to code I plan to get dirty with C# and Vala so that I will be familiar with it when I start to officially code on 26th of May.

Although Google suggests that from 9th of August to 16th August we should write tests, documentation and that kind of stuff I plan to write the majority of this at least in rough form while I work (and when things are still fresh in my memory). My plan is to dedicated half or if needed an entire day each week (probably during weekends) to write most urgent tests and write documentation drafts. Hopefully this will mean that for the last week I will be changing this drafts into final masterpieces.

Plan is to move from existing bugs to adding new features to integration with other Ubuntu programs (unless required beforehand, as will be the case of Nautilus and its extensions).

May 26 - End of June

For start, I plan to do the bugs that are in the current release and then move to bugs that are targeted for version 16. This also covers a lot of things that I would like to do with my specification.

Week 1:

  • - Deja Dup should verify if there is enough free space; - Make the removable-drive experience rock - might be a bit tricky, but will probably still succeed in accomplishing in large part in first week;

Week 2:

After completing High and Medium priority bugs it is a matter of discussion with a mentor whether or not it is better to move to Low priority/Undecided bugs or move to a wish list already Low priority:

  • - Ignores include directories with a symlink in the path; - Passwords aren't protected in-memory; - Slow backup speed; - If time remains, being work on Profiles for different target locations;

Week 3:

  • - Since Profiles for different target locations will require changes to UI it might take entire week; - Might work on "smart" backup removal like in Back in time;

Week 4:

  • - Expose better UI for file-based restore (Nautilus extension) - Add a 'restore missing files' view (?) - When destination is full, offer to delete older backups (?);

Week 5:

  • - Anything that would not be completed at this stage regarding existing Deja Dup bugs

July

Week 1:

  • - If I successfully complete everything by this stage, I move on, but to be on the safe side first week of July is reserved for bugs that would still not be resolved properly

Week 2:

  • - Add ability to select files, not only directories

    - Add Amazon S3 support to gvfs (are we sure about this? Sebastien suggested that this should be reported upstream? https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/551982/comments/1) - Restore installed packages as well as files (might be tricky if we do this since this effectively means system wide restore - possibly out of scope?)

Week 3:

  • - Add optical media support - could take all week since there is no existing code

Week 4:

  • - Add integration with Indicator Applet - Hopefully begin integration with Ubiquity or implement a solution to configure backup system at first boot

August - 16th August

Week 1:

  • - Wrapping things up, hopefully I also finish first-boot configuration wizard

Week 2:

  • - Finish tests and docs
  • Why will your proposal benefit Ubuntu?

Quite simply it will give every Ubuntu user a safe heaven for their most precious belongings - data since backup will become so easy that there simply won't be any more excuses for not having it.

Open Source

  • Please describe any previous Open Source development experience

I have not worked on any open source project itself but I have worked with and used almost exclusively open source tools in my career. This makes me even more eager to get dirty with the tools that I have used for years but was kind of scared to start developing. Smile :)

As mentioned briefly in project proposal, I have most extensive experiences with programming for Python and Javascript.

From tools that I used and that are based on Python I used Django, Turbogears, I worked with Flix Engine (closed source, but based on mplayer) for processing of video, worked with data analysis with scipy and plotting of that data with matplotlib and at one point or another used a number of other tools.

Most of my Javascript was written either pure or with jQuery although I had used a number of other libraries (Prototype, Mootools, MochiKit, YUI).

I have Ubuntu on all of my home computers and had in the past played with a number of different distributions (from Red Hat to Gentoo) but I also run OS X on my laptop so I have experiences with other *nix architectures as well.

  • Why are you interested in Open Source?

Because it is has produced some of the most useful tools on which even other commercial products are based or (if the products are competition) has on numerous occasions outpaced commercial competition. That these tools are available to everyone to play with and modify is a concept that is just too nice to miss on.

Availability

  • How long will the project take? When can you begin?

I can safely say that project will take all of the allocated time for GSOC (since even if I do everything from Ubuntu's proposal I will still try enhance user experience with enhancements for Ubiquity.

  • How much time do you expect to dedicate to this project? (weekly)

I plan to work on it daily everywhere from 5 to 10 hours - depending on my obligations and whether or not I am behind or ahead of schedule. On average I will probably work the specified amount of 40 to 50, maybe 60 hours.

  • Where will you based during the summer?

At home in Slovenia. If such arrangement can be made and is desired by Ubuntu I have no problem moving to South Africa or some other location.

  • Do you have any commitments for the summer? (holidays/work/summer courses)

I have no plans for holidays yet but I might take a week off - in such a case I will schedule my work accordingly and go to such holiday destination that will have internet and allow for work to be done. On holidays I will work on a shorter (5 hour) schedule and will do more work in the weeks before and after.

I have no other plans for work. I will however have some exams in June and July but those should not disturb my schedule too much.

  • Please designate a back up student (in case you need to withdraw your application)

There were two or three students that suggested working on this - since I do not personally know any of them, I might suggest any one of them.

If however I can suggest someone that I know, I know several local hackers that would probably be more than capable of doing the work.

Other

  • Have you ever participated in a previous GSoC? (describe your project)

No.

  • Have you applied for any other 2010 Summer of Code projects? If yes, which ones?

Not yet, but I am thinking of one or two. If I do, I will make appropriate notifications

  • Why did you apply for the Google Summer of Code ?

I consider GSOC an excellent introduction into the realm of development of open source software that I use on daily basis. Since projects are done under the supervision of mentors I believe that, for a start, a lot of unnecessary hassle that can get you down when you get into development of such complex systems as are all large OSS projects can be either reduced to mere trivialities or removed all together.

  • Why did you choose Ubuntu as a mentoring organization?

It is in my opinion (and verified by accomplishments of Ubuntu/Canonical) one of the most promising OSS projects and as such represents one of the best opportunities to start developing OSS software that is used by millions on daily basis.

  • Why do you want to participate and why should Ubuntu choose you?

When I decided that Ubuntu would be a perfect organization at which to start developing OSS software I was left with a decision of what to develop. Although I have experiences with Ubuntu I was still wondering what might that be.

Since other members of my family are average computer users and I know what kind of problems they can have when handling computer problems that are out of domain of their everyday work, I knew that I wanted to do something that would help exactly this kind of users - ones that know what a computer is, know for what it can be used, know and understand the problems but at the same time are not skilled computer technicians and nor do they wish to become.

I knew that backup can be quite tricky for such users but, truth be told, I would most probably not suggest the idea myself since for my family I resolved the problem of backup with a few simple scripts - a solution, that is not appropriate for other such users. Its inappropriate even for us at home since, god forbid, if I ever get in some kind of health problems and am unable to administer the system, no one at home would be able to access the data.

When I saw the suggestion of backup solution and did a bit more research on the topic (saw that solutions (Deja Dup) exists but still needs a lot of work and need to be far better integrated into Ubuntu - even this need for research can sometimes be a deal breaker for users since they sometimes underestimate the risk of data loss) I immediately saw that this is an area in which I could really make a difference! And not just difference as in "make some software better and less bug-riddled", but a difference that can help thousands of users that are at the risk of loosing data and for which I know that such a simple and reliable system would be a God-given gift! Since I have a long standing wish to somehow get into OSS development I saw this as an ideal opportunity to join wonderful community of developers that Ubuntu is.

As for why should Ubuntu choose me... In summary, I would say that the reasons for this would be years of experience with programming languages, various computer tools, fluent written and spoken English, my experiences with writing specifications and documentation (maybe a good example at this point could be this application and my specification regarding implementation of the solution ;)), my confidence in always finding a solution, my extensive experiences with working in teams (I believe that communication should be done when needed and not just every week/month so that problems are discovered when they arise and can therefore be resolved much faster - but, as suggested, at a minimum at least on a weekly basis) and alone (reading documentation isn't exactly something for what I would waste my mentor's time ;)), my history of projects that I successfully completed that should testify that I can complete things successfully (CV) and, above all - my passion.

GSoC/2010/UrbanSkudnik (last edited 2010-04-08 21:40:26 by Neo--)