LanguagePacksForUniverse

Summary

Universe lacks any kind of Rosetta integration feature so it's difficult to get translations update for it. We need to provide a solution like the one for main or an improved one.

Rationale

Although we don't support Universe, people are interested on applications from Universe and thus, there are people interested on translate them and also in see them fully translated.

This would allow a full distribution translation.

Use cases

  • Peter wants to translate VLC for dapper. He knows it's not supported by Canonical but he really likes that application so he translates it and with next language packs, he gets his translation deployed.
  • Jordi found a bug in the Catalan translation of Galeon, he fixes it and waits for next language pack update to see it fixed for all Catalan users of that application. Galeon is in universe.

Scope

This spec requieres updates in:

  • Rosetta.
  • Depending on the solution chosen, we would need to update apt-get, update-manager, langpackomatic, libc, etc...

Design

We need a system that allows us to distribute translations for any application in Universe. It should fit the following requirements:

  • You would install only translations that you are interested on. No extra languages that are not related to your preferred ones or translations for applications that you don't have installed.
  • You should be notified when a new translation is available so you can get updates, even after release time.
  • The updates must not overwrite any file stored in our system that is handled by dpkg that are not related with language packs.
  • The metadata information to handle translation updates should be as small as possible.
  • The system must not be .po/.mo specific, it should update other files like documentation, .desktop files, firefox/openoffice language packs, and any other translatable resource we could have in our system.
  • We should be able to expand its usage to support our installer. This will allow us to use this solution with our 'main' repository so even if we choose a different approach from what we have in main, we could migrate main for that new solution.
  • The solution should be able to use mirrors to distribute the files so we don't overload launchpad or a single server.
  • The translations should be installed and available just after the application is installed. The user should not be aware that the translations are not part of the binary application, they don't care about it, they just want their application full translated just after its installation.
  • When an application is removed, its translations should be removed too so we don't waste resources.

We will use exactly the same solution we are using in main. All translations for a group of languages are stored inside a single .deb package with a -base package for the initial distribution release and a -updates package for the monthly updates after release. Also, all translations from the binary packages are stripped so we only ship translations as part of the language packs.

Design Rationale

While thinking on this, we found several solutions. All them with their pros and their contras and we are going to document them here to be sure that we have those present if the requirements change:

  1. Use exactly the same solution we are using in main. All translations for a group of languages are stored inside a single .deb package with a -base package for the initial distribution release and a -updates package for the monthly updates after release. Also, all translations from the binary packages are stripped so we only ship translations as part of the language packs.
    • Pros:
      • Simple to manage, we have already all the infrastructure there to create/maintain/deploy it.
      Contras:
      • The amount of packages in universe is higher than in main and thus the language packs would be bigger.
      • You would need to install a lot of translations even if you are interested in just one package from Universe.
      • We would need around 200 new packages (100 for base, 100 for the updates)
  2. Same solution as before, but without using -base packages and without striping translations from binary packages.
    • Pros over previous option:
      • The initial language pack will be much more small because will only contain changes done in Ubuntu/Rosetta and the others will be part of the binary packages, so the users will have installed only the translations that they need + all updates for universe translations.
      • We will need “only” around 100 new packages for the -updates packages.
      Contras over previous option:
      • If translators do a good job, the size of the language pack would be directly high and thus we don't get a big win.
      • We would have two different language pack behaviours, main must do the strip to save some disk space for the installation/Live CD.
  3. We could create as many packages as languages supports a source package so we could have something like:
    • foo_0.1-0ubuntu1_i386.deb
      language-pack-af-foo_6.06+20060603_all.deb
      language-pack-ca-foo_6.06+20060603_all.deb
      language-pack-es-foo_6.06+20060603_all.deb
      ...
      language-pack-pt-BR-foo_6.06+20060603_all.deb
      language-pack-zh-CN-foo_6.06+20060603_all.deb
      language-pack-zh-TW-foo_6.06+20060603_all.deb
      
      bar_1.3-1ubuntu13_i386.deb
      language-pack-af-bar_6.06+20060603_all.deb
      language-pack-ca-bar_6.06+20060603_all.deb
      language-pack-es-bar_6.06+20060603_all.deb
      ...
      language-pack-pt-BR-bar_6.06+20060603_all.deb
      language-pack-zh-CN-bar_6.06+20060603_all.deb
      language-pack-zh-TW-bar_6.06+20060603_all.deb
      • Pros:
        • You will install exactly the translations that you want and nothing more. Also, you would remove them when the application is not being used. This means less disk space wasted.
        • We could update language packs more often than currently (once per month) because users will get only exactly the translations that they are using.
        Contras:
        • The amount of packages would explode a lot. Say we have around 35 languages on average and after some checks, we know that only 10% of universe's source packages have translatable resources and we have near 12000 source packages in Universe, that would mean that we will need around 38500 new .deb packages to ship translations and that number will grow when people start doing more translations.
        • Due the amount of packages we are talking about, the metadata information would be really huge and most of the time will be duplicated.
        • apt-get's Packages.gz file will grow a lot.
  4. Same solution as before, but expanding apt-get to accept Packages-LANG.gz so we add the language packs to their own Packages file. This will be backward compatible because Packages.gz will still be there, but without translations and the new apt-get would get advantage of the language packs if they are present.
    • Pros:
      • Same as before.
      • Mirrors could mirror just the languages they are interested on.
      Contras:
      • Same as before except Packages.gz will stay with an acceptable size.
  5. Same solution as before, but using different components instead of adding the Packages-LANG.gz files. The good thing over the previous solution is that you don't need to change apt-get. The bad thing is that you would need around 35 * len(main, universe, multiverse, restricted) new components.

  6. Finally, we could stop using .deb packages to handle translations and create our apt-lite system integrated with update-manager and apt-get that wouldn't have dependencies or versioning system other than a catalogue with a list of source packages and links to its translatable resources. The system will just use MD5 sums to know if you have latest version of a translation and will warn you about new translations available. You will get only translations for installed packages and the system would remove translations for uninstalled applications. Of course, those files will be installed inside a directory (/usr/share/locale-langpack) that will not be handled by dpkg so we don't overwrite anything from dpkg.

    • Pros:
      • The metadata will be really low.
      • We could update all translations daily if we want, that will not produce an update of the other translatable resources because every file is independent of the others.
      Contras:
      • Due we don't handle versions, we are not able to restore a distro release to its concrete release status related to translations. Jeff Bailey pointed this problem as a possible problem for the support team.
      • We need to do a light implementation of apt-get.
      • The mirroring scripts for Ubuntu cannot be reused to mirror translations.

Final decision

After checking all those options, we decided to go with the solution #1. The reasons we decided to choose that one are:

  • No new infrastructure is needed.
  • After some checks with Universe, we discover that although we have double of packages with translatable resources in universe than in main (around 1100 vs 550), the amount of translations is lower in universe than in main so universe language packs would be smaller than the ones in main. This would change in the future, but in the mean time, we could go with this simple solution.
  • The fact that we use -base and -update packages allows us to use exactly what we have in main so we don't need to care about different handling of language packs for main or universe.

Data preservation and migration

Nothing is needed.

Outstanding issues

BoF agenda and discussion

ago 18 13:03:38 <sabdfl>        hi guys
ago 18 13:03:46 <danilos>       hi sabdfl 
ago 18 13:04:14 <sabdfl>        is anyone from the distro side going to join us?
ago 18 13:04:26 <carlos>        pitti
ago 18 13:04:40 -->     pitti (n=pitti@ubuntu/member/pitti) ha entrado en #cm
ago 18 13:04:42 <pitti> hi
ago 18 13:04:43 <carlos>        hi
ago 18 13:04:50 <pitti> sorry, I have to learn when 'cm' is an abbreviation and when not :)
ago 18 13:04:52 <sabdfl>        hey pitti
ago 18 13:04:56 <pitti> hi sabdfl
ago 18 13:05:01 <carlos>        pitti: I have always the same problem ;-)
ago 18 13:05:05 <sabdfl>        ok, let's start
ago 18 13:05:08 <danilos>       sure
ago 18 13:05:11 <sabdfl>        thanks for looking at this in paris
ago 18 13:05:19 <pitti> so, I gave this some thought
ago 18 13:05:20 <sabdfl>        i wasn't in the discussions but i have read the spec
ago 18 13:05:30 <pitti> in the end, I think only solutions 1 and 6 make sense
ago 18 13:05:52 <pitti> the others seem to be abusing the archive structure and apt in nasty ways
ago 18 13:06:15 <carlos>        sabdfl: our point was not to go with current language packs because we are scared of the other solution, but that we didn't have enough time for Edgy to do all modifications
ago 18 13:06:28 <carlos>        so we choose the only realistic solution for Edgy
ago 18 13:06:29 <sabdfl>        carlos: then drop universe for edgy
ago 18 13:06:38 <pitti> sabdfl: I think it boils down to: (1) can be done with a snap of the finger and is known to work, but not really pretty
ago 18 13:06:48 <carlos>        and documented the others for edgy + 1 or any other release when we get some extra time
ago 18 13:06:54 <pitti> (6) is a lot of work, but offers better features
ago 18 13:07:21 <pitti> like direct integration into apt/synaptic/etc., so that when you install a package, you'd automatically get translations, too, etc.
ago 18 13:07:24 <sabdfl>        i would like 6, but understand that it cannot be done for edgy
ago 18 13:07:34 <carlos>        pitti: I think so, yes
ago 18 13:07:48 <sabdfl>        the "translation manager" would need to get notifications, so it would know how to go look for relevant translations, yes
ago 18 13:08:02 <pitti> sabdfl: I also think (6) should be discussed with the admins to judge the ramifications of mirroring, etc.
ago 18 13:08:02 <carlos>        danilos: just in case you don't know the document we are talking about... https://wiki.ubuntu.com/LanguagePacksForUniverse
ago 18 13:08:14 <danilos>       carlos: yeah, I know what you're talking about :)
ago 18 13:08:43 <sabdfl>        how big is the pressure for SOMETHING in edgy for universe?
ago 18 13:08:53 <carlos>        not too big
ago 18 13:08:56 <pitti> btw, carlos and I shortly discussed solution (2) yesterday
ago 18 13:09:00 <sabdfl>        will people be happier if we deliver better translation of main, for example, instead?
ago 18 13:09:07 <carlos>        and we are in time to remove it now (nothing has been imported yet)
ago 18 13:09:13 <pitti> sabdfl: right now folks are crying for edgy translations
ago 18 13:09:19 <sabdfl>        of main
ago 18 13:09:20 <pitti> for dapper the process works reasonably well
ago 18 13:09:55 <danilos>       sabdfl: translation is always measured in its completeness, and it depends on all software being translated
ago 18 13:09:57 <sabdfl>        are there any other nice features planned for edgy, like package description translations, or a better way to translate the menu items?
ago 18 13:10:04 <pitti> sabdfl: I didn't hear particular requests for universe translations yet
ago 18 13:10:08 <carlos>        We already promissed universe lang packs not sure if officially, but people know about it, but I don't think it would be too bad to defer that feature
ago 18 13:10:11 <danilos>       sabdfl: especially with things such as "stock" gtk+ buttons, menus, which end up translated even in non-translated stuff
ago 18 13:10:14 <pitti> but that might also be due to the fact that you don't miss what isn't there :)
ago 18 13:10:21 <sabdfl>        danilos: we have SUCH a lot of stuff in main that could be better translated, i think universe is not the highest priority
ago 18 13:10:50 <danilos>       sabdfl: it's not about A LOT, it's about WHAT: if people use regularly some stuff from universe, they'd want to have it translated, IMHO
ago 18 13:11:00 <carlos>        sabdfl: yes, I think the package descriptions will be part of Edgy
ago 18 13:11:04 <pitti> sabdfl: e. g. I think getting a better process for firefox and OpenOffice should be higher prioritized than getting an awesome universe translation structure
ago 18 13:11:09 <carlos>        at least I saw movement in that spec
ago 18 13:11:10 <sabdfl>        danilos: we won't strip the translations from the packages, so they will get pre-release translations
ago 18 13:11:18 <danilos>       sabdfl: yeah, right
ago 18 13:11:20 <sabdfl>        could we stuff rosetta translations INTO packages at build time?
ago 18 13:11:21 <carlos>        pitti: we are doing that already for Edgy
ago 18 13:11:40 <pitti> btw, can we shortly discuss the 'not strip packages' case?
ago 18 13:11:41 <carlos>        sabdfl: I don't think that's possible until bazaar integration is done
ago 18 13:11:41 <sabdfl>        so, for example, if there is a security update then you get newer translations too as a bonus?
ago 18 13:11:44 <pitti> this would be solution (2)
ago 18 13:12:06 <danilos>       I don't know that much about our packaging infrastructure to be able to answer any of this :)
ago 18 13:12:09 <pitti> sabdfl: this would work, it would just be a huge hit if we updated translations without other updates
ago 18 13:12:13 <sabdfl>        carlos: why, surely it can be done as an explicit msgset merge command during the package build?
ago 18 13:12:16 <pitti> and universe gets *very* few updates post-release
ago 18 13:12:47 <pitti> users will rightfully yell at us if they have to re-download all their universe packages just to get some new strings
ago 18 13:12:51 <sabdfl>        pitti: well, it would allow us to say "get in and translate universe before release and most of those translations will go into the release itself"
ago 18 13:12:52 <danilos>       pitti: wouldn't that solution involve patching all the software in universe?
ago 18 13:13:01 <pitti> that seems much worse to me than downloading an 1 MB update langpack 
ago 18 13:13:04 <sabdfl>        i agree, pakcage update just for translations would be bong
ago 18 13:13:37 <carlos>        sabdfl: we don't have a way to get .po files directly from rosetta so we would need to develop a new script for that with some kind of XMLRPC API because parsing emails doesn't sounds like a good solution...
ago 18 13:13:39 <pitti> sabdfl: right, pre-release that's certainly fine, but still requires package rebuilds, version bumps, and lots of bandwith waste
ago 18 13:14:08 <sabdfl>        pitti: true - a huge rebuild at rc1 would cause potential issues
ago 18 13:14:17 <pitti> sabdfl: TBH I'd prefer update langpacks (solution 2) over rebuilding packages, at least with our current way of doing source packages
ago 18 13:14:19 <danilos>       i.e. I don't understand how did anyone think solution 2 would work: more patching of glibc gettext() stuff?
ago 18 13:14:22 <sabdfl>        and we would have no easy way to correct bad rosetta translations discovered after release
ago 18 13:14:42 <pitti> danilos: yes, that's what I wanted to discuss
ago 18 13:14:49 <pitti> I think we have to do a glibc change anyway
ago 18 13:15:17 <pitti> I think it's a bad idea to rebuild all 1100 universe packages just to strip all universe translations
ago 18 13:15:24 <pitti> (1100 universe packages that have translations, that is)
ago 18 13:15:38 <danilos>       pitti: but this would be for using translations from two different domains, or simply including a complete PO/MO file if there was a change in Rosetta?
ago 18 13:15:45 <pitti> so with both solution (1) and (2) we need that glibc hack for a transition period
ago 18 13:16:04 <carlos>        danilos: the latter
ago 18 13:16:08 <pitti> danilos: I'd go with updating a complete mo
ago 18 13:16:09 <danilos>       I mean, after a while, it would be simply including all stuff in langpacks
ago 18 13:16:15 <carlos>        danilos: right
ago 18 13:16:16 <pitti> danilos: otherwise the runtime impact is too high
ago 18 13:16:22 <danilos>       pitti: agreed
ago 18 13:16:26 <pitti> danilos: with solution 1 that is
ago 18 13:16:48 <pitti> but we should first agree to a high-level goal
ago 18 13:16:55 <danilos>       sure
ago 18 13:17:00 <sabdfl>        is (2) ever going to be better than (6) if we can actually do (6)?
ago 18 13:17:15 <sabdfl>        if not, then I would rather defer, and do (6) for edgy+1
ago 18 13:17:16 <pitti> do we want universe translations, like *now*, or do we want to do them in a really sophisticated and cool way, but in maybe one year?
ago 18 13:17:23 <sabdfl>        pitti: the latter
ago 18 13:17:28 <danilos>       btw, do you have any estimate of the size of per-language langpacks for universe? I figure they'd be huge
ago 18 13:17:33 <carlos>        then edgy should not have universe translations
ago 18 13:17:36 <carlos>        updates
ago 18 13:17:36 <pitti> i. e. (1)/(2) now or (6) later
ago 18 13:17:44 <pitti> danilos: no, they aren't
ago 18 13:17:46 <carlos>        danilos: less than main
ago 18 13:17:50 <pitti> danilos: in fact ATM they would be smaller than main
ago 18 13:17:55 <danilos>       hum, nice :)
ago 18 13:18:08 <pitti> the most translation-laden stuff is desktop stuff, which is mostly in main
ago 18 13:18:11 <carlos>        danilos: universe packages translations are really bad compared with main ones
ago 18 13:18:11 <danilos>       I'd vote for better implementation, even if it delays it
ago 18 13:18:13 <sabdfl>        danilos: the packages are less popular, therefor less translated (and often less likely to be translatable)
ago 18 13:18:39 <sabdfl>        ok, this looks clear to me - don't try to do it badly for edgy, focus on better quality translations of main
ago 18 13:18:53 <pitti> that's why we stated that global universe langpacks would be a possible alternative
ago 18 13:19:10 <carlos>        ok
ago 18 13:19:25 <pitti> if we do (6), then we probably lose some of the infrastructure features of apt
ago 18 13:19:38 <sabdfl>        if we do (6) we should keep it super simple
ago 18 13:19:41 <pitti> like, straightforward mirroring, CD burning, debmirror, etc.
ago 18 13:19:54 <carlos>        pitti: we added a solution for that, right?
ago 18 13:19:55 <danilos>       jigdo? :)
ago 18 13:20:01 <sabdfl>        the "packages" file should just be names and dates
ago 18 13:20:02 <pitti> but if we do a simple and efficient HTTP lookup/download and a nice integration into apt, this should cover the common case as well
ago 18 13:20:10 <carlos>        prepare a .deb package that installs the initial .mo files 
ago 18 13:20:18 <carlos>        to include it in the installation CD
ago 18 13:20:18 <sabdfl>        i would not integrate into apt, have a separate tool which handles it
ago 18 13:20:25 <sabdfl>        carlos: +1
ago 18 13:20:45 <pitti> sabdfl: apt integration> I meant 'apt-get install foo' would also take care of downloading the corresponding translations for your selected languages
ago 18 13:20:46 <danilos>       what initial .mo files? are there any from universe on installation CD?
ago 18 13:20:48 <pitti> I think that would be cool
ago 18 13:21:01 <danilos>       pitti: +1
ago 18 13:21:03 <pitti> sabdfl: likewise for g-a-i, synaptic, etc. of course
ago 18 13:21:14 <pitti> danilos: universe packages aren't shipped on CDs
ago 18 13:21:35 <danilos>       pitti: so why the need for "initial .mo files" (I may be misunderstanding this altogether)
ago 18 13:21:38 <danilos>       ?
ago 18 13:21:42 <carlos>        danilos: we start with universe and if it works, we will move it to main
ago 18 13:21:53 <carlos>        so we need a solution for main too
ago 18 13:21:56 <danilos>       carlos: ah, ok
ago 18 13:22:02 <carlos>        even if we are not going to use it with first release
ago 18 13:22:22 <pitti> carlos: that's what I mean with 'losing infrastructure features'
ago 18 13:22:26 <carlos>        sabdfl: the idea is to use an apt-get hook to do that installation that pitti said
ago 18 13:22:34 <pitti> i. e. it's nontrivial to ship langpacks on CDs that way
ago 18 13:22:38 <pitti> of course it's possible
ago 18 13:22:44 <carlos>        yeah
ago 18 13:22:59 <carlos>        pitti: but that's the same problem as no using apt-get and dpkg standard infrastructure
ago 18 13:23:05 <danilos>       carlos: isn't it simpler to provide a file:/// handler at the same time as http:///, so you can install them in the same way from CD?
ago 18 13:23:18 <danilos>       if we're to go with patching apt-get, that is
ago 18 13:23:38 <pitti> apt-get should just call a hook, which then calls 'update-translations pkg-foo' or so
ago 18 13:23:39 <carlos>        danilos: not sure, we should expand the spec after edgy release (or once we get all our tasks done)
ago 18 13:23:51 <pitti> apt-get itself should know as little about this langpack thingy as possible IMHO
ago 18 13:24:01 <carlos>        pitti++
ago 18 13:24:29 <carlos>        we should try to get this solution working without patching a bunch of packages
ago 18 13:24:32 <pitti> so, solution (6) needs some long and intense discussion, we should do this in person on the next meeting with admins involved, too
ago 18 13:24:58 <carlos>        pitti: agreed
ago 18 13:25:11 <carlos>        also, I think Jeff should be present too
ago 18 13:25:14 <pitti> carlos: yes, the initial infrastructure would boil down to 'download de/es translations from this server and put them into this path', and the matchign glibc patch
ago 18 13:25:26 <danilos>       pitti: yeah, and I'd still want to push some more features into language packs when that discussion happens :)
ago 18 13:25:28 <pitti> and then we need the proper integration into the package managers and language selector
ago 18 13:25:45 <sabdfl>        we could over-engineer this very easily :-)
ago 18 13:25:53 <sabdfl>        here's what I would care about
ago 18 13:26:10 <sabdfl>        (1) the index file has a date, so you always know if one is newer than another
ago 18 13:26:25 <pitti> (or just a counter)
ago 18 13:26:45 <sabdfl>        (2) the index file is as small as possible, so we don't get a Packages.gz style multi-mb download just to find out if you need to fetch new translations
ago 18 13:27:28 <sabdfl>        (3) each updated PO/MO file is listed with just distrorelease, name, language and date, so again you always know if  you have the newest
ago 18 13:27:35 <pitti> it should become something like 2000 packages * 100 languages * 50 byte per entry = 10 MB
ago 18 13:27:49 <sabdfl>        hmm... index per language, then
ago 18 13:27:53 <pitti> right
ago 18 13:28:10 <danilos>       I doubt there'd be 100 translations per package in universe
ago 18 13:28:12 <pitti> this will spread the load on the mirrors and bw requirement
ago 18 13:28:17 <danilos>       only best translated packages have that many
ago 18 13:28:20 <pitti> no, this was just a maximum estimation
ago 18 13:28:22 <sabdfl>        so system needs to track "which mo files do I need, and which do I have, and what dates are on them"
ago 18 13:28:26 <carlos>        danilos: but we would have them for main
ago 18 13:28:45 <sabdfl>        danilos: it should scale by the number of languages installed, not available
ago 18 13:28:53 <pitti> max. 100 kB index per language sounds reasonable
ago 18 13:29:00 <sabdfl>        so if people install more languages, they need more index files, not the other way round
ago 18 13:29:07 <pitti> most users will only care about one or two languages anyway
ago 18 13:29:12 <sabdfl>        exactly
ago 18 13:29:17 <danilos>       sabdfl: that's a better reason
ago 18 13:29:31 <pitti> and 100 index files on the mirrors sounds reasonable, too
ago 18 13:29:31 <sabdfl>        another nice thing is that we could make it easy to turn on automatic language updates
ago 18 13:29:40 <sabdfl>        because it's not like packages where you definitely want user approval
ago 18 13:29:44 <pitti> update-notifier! :)
ago 18 13:29:45 <sabdfl>        newer translations are better
ago 18 13:29:51 <sabdfl>        it should just DOIT
ago 18 13:29:57 <sabdfl>        unless the sysadmin has turned that off
ago 18 13:30:03 <danilos>       yeah
ago 18 13:30:10 <carlos>        sounds good
ago 18 13:30:14 <pitti> sabdfl: well, a 'do this automatically from now on' checkbox
ago 18 13:30:17 <sabdfl>        i would like us to develop this, use it for universe, and then use it for main too if it is smooth
ago 18 13:30:19 <pitti> for the poor modem users
ago 18 13:30:28 <sabdfl>        sure
ago 18 13:30:45 <carlos>        pitti: just like update-manager handles automatic .deb installation
ago 18 13:30:51 <sabdfl>        we can also then to langpack updates on a per-package, per-language basis
ago 18 13:31:00 <pitti> for this solution we would gradually strip universe packages AFAICS
ago 18 13:31:04 <danilos>       sabdfl: well, that's what we're getting with this :)
ago 18 13:31:19 <danilos>       pitti: with the next update, I'd say
ago 18 13:31:20 <pitti> per-package> yes, that's the whole point of (6)
ago 18 13:31:28 <sabdfl>        so translation teams can sign off on an updated package-language, then press a button and queue it for publishing
ago 18 13:31:50 <pitti> so at first we wouldn't do any apt-get magic, but just do a little u-n integration and some backend which downloads and installs tarballs with mo files
ago 18 13:31:55 <sabdfl>        pitti: i meant the timing of the updates, not the downloading of them
ago 18 13:32:01 <sabdfl>        at the moment, we do them all once a month
ago 18 13:32:15 <sabdfl>        but it would be better if a translation team completes a package for them to say "publish now, please"
ago 18 13:32:16 <pitti> sabdfl: I think with that granularity they could even be updated daily
ago 18 13:32:21 <carlos>        sabdfl: well, we can update just the ones updated that week...
ago 18 13:32:27 <carlos>        sabdfl: even daily ;-)
ago 18 13:32:42 <carlos>        sabdfl: we only update the ones that got new translations
ago 18 13:32:44 <sabdfl>        yes, there should be a sweeper process which pushes newer translations out automatically after a while
ago 18 13:32:46 <pitti> well, the period doesn't matter so much, it can be adapted to the mirror load
ago 18 13:32:56 <sabdfl>        but i think the translation teams will like being able to work, test work, test, "release"
ago 18 13:33:23 <sabdfl>        pitti: will you talk to archive managers about putting these in the dists tree?
ago 18 13:33:27 <danilos>       aha, option "pull directly from rosetta"? if we didn't have async exports, it would be possible today :)
ago 18 13:33:41 <sabdfl>        do we want mirrors to be able to mirror just certain languages?
ago 18 13:33:49 <carlos>        sabdfl: I think so
ago 18 13:33:52 <pitti> sabdfl: I don't think they will like it there; should probably become translations.launchpad.net or translations.ubuntu.com or so
ago 18 13:34:02 <danilos>       sabdfl: if we have indexes per language, they'd already be able to
ago 18 13:34:06 <pitti> danilos: no, not from rosetta; the DDoS will kill it 
ago 18 13:34:06 <sabdfl>        danilos: we will push daily or on-demand to a staging server, and translators can fetch their updates from there
ago 18 13:34:24 <sabdfl>        danilos: not just the indexes, but the actual mo files
ago 18 13:34:33 <sabdfl>        hmm.. that's probably so little data it does not matter hugely
ago 18 13:34:49 <danilos>       pitti: I am thinking only of the "work, test, work, test" part of sabdfl's idea: pull directly from rosetta for testing by translators
ago 18 13:34:56 <sabdfl>        pitti: it could be even smaller than 100kb
ago 18 13:34:59 <pitti> danilos: right, that would work
ago 18 13:35:15 <pitti> danilos: but we shouldn't have a single server for the default updating mechanism
ago 18 13:35:32 <danilos>       pitti: right, anyway, more discussions with infrastructure pending
ago 18 13:35:51 <sabdfl>        if the index is just domain (mo template filename) and date (or counter)
ago 18 13:35:52 <pitti> that being said, yes, I can talk to the soyuz and admin guys about a proper place to stick them to
ago 18 13:36:02 <sabdfl>        ok, thanks
ago 18 13:36:12 <carlos>        sabdfl: the index should have also a link with the sourcepackagename
ago 18 13:36:15 <pitti> the archive itself, a separate hierarchy on archive mirror, network, or a parallel network
ago 18 13:36:36 <carlos>        sabdfl: so we know the .mo files to install when we get a new package installed
ago 18 13:36:41 <sabdfl>        pitti: i would like the default to be that current mirrors just acquire this data
ago 18 13:36:52 <sabdfl>        because we are going to be taking it out of packages, which they would otherwise mirror anyway
ago 18 13:37:13 <sabdfl>        carlos: the package can register those on installation
ago 18 13:37:28 <pitti> sabdfl: true; it should work with rsync, won't work with debmirror (I don't know which is used where)
ago 18 13:37:31 <sabdfl>        so install the package, it tells the translation manager "i would use translation domains X, Y and Z"
ago 18 13:37:38 <carlos>        we don't have that information on build time, we can guess it
ago 18 13:37:44 <carlos>        sabdfl: but Rosetta knows it for sure
ago 18 13:37:47 <danilos>       can't we have a simple "install-translations-for-current-languages MOFILE" which would be run in post-install?
ago 18 13:38:01 <sabdfl>        right - and we can also fix packages like that by hand, it just has to be done once
ago 18 13:38:06 <pitti> sabdfl: hm, the index file should just be per-package, not per-domain
ago 18 13:38:06 <carlos>        sabdfl: and we can expand this to support documentation translations without needed to rebuild the packages
ago 18 13:38:30 <sabdfl>        pitti: i don't think it should be per-package, because translation domains are required unique in any event
ago 18 13:38:41 <carlos>        sabdfl: that's not always true...
ago 18 13:38:43 <sabdfl>        and a single package might have several domains
ago 18 13:38:44 <pitti> danilos: please do not mention 'simple' and 'modify all package's postinst files' in one sentence :)
ago 18 13:38:56 <carlos>        sabdfl: it should, but some upstreams doesn't follow that rule
ago 18 13:39:00 <danilos>       pitti: heh, ok :)
ago 18 13:39:05 <sabdfl>        it should not be in the postinst anyhow
ago 18 13:39:10 <sabdfl>        it should be async on package install
ago 18 13:39:14 <pitti> no, it shouldn't touch source packages at all
ago 18 13:39:16 <sabdfl>        install the package, then later update translations
ago 18 13:39:32 <pitti> sabdfl: I agree
ago 18 13:39:38 <sabdfl>        pitti: or touch them in an automated way that the builder can do for all packages, like stripping
ago 18 13:39:38 <pitti> update-notifier updates daily anyway
ago 18 13:39:40 <pitti> that should be enough
ago 18 13:39:44 <danilos>       sabdfl: doesn't that disconnect "list of domains" and "sourcepackagename"?
ago 18 13:39:53 <pitti> and synaptic/g-a-i can also trigger it manually
ago 18 13:40:09 <pitti> btw, domain/package name doesn't matter so much
ago 18 13:40:16 <pitti> rosetta can export a mapping between them easily
ago 18 13:40:19 <sabdfl>        danilos: the sourcepackage name is irrelevant. the binary package is the "user" of the translation data
ago 18 13:40:31 <danilos>       I should probably not comment on packaging issues which I am not very familiar with :)
ago 18 13:40:34 <sabdfl>        the binary package should just know which .mo files it *would* use
ago 18 13:40:39 <danilos>       sabdfl: yeah, that's what I meant
ago 18 13:40:54 <sabdfl>        and the translation manager can then go "ok, i'll see if I can find those for the languages we need on this system"
ago 18 13:41:07 <sabdfl>        but that shoudl happen separately from package install
ago 18 13:41:12 <pitti> sabdfl: the 'would' is tricky :)
ago 18 13:41:17 <pitti> but why not per source packae
ago 18 13:41:24 <pitti> that wouldn't be much worse IMHO, but much easier
ago 18 13:41:33 <carlos>        sabdfl: I still think that's rosetta's job
ago 18 13:41:33 <danilos>       sabdfl: yeah, I just don't know enough of how one would find that out: "get-mo-files-for-package PACKAGE"
ago 18 13:41:46 <danilos>       know enough about packaging, that is
ago 18 13:41:47 <pitti> often a source package shares a translation domain with several binaries anyway
ago 18 13:41:47 <sabdfl>        pitti: it's no problem for the builder to say "ok, you didn't say anything about mo files, and I see your source package has these domains, you are going to say you need all of them"
ago 18 13:42:04 <sabdfl>        so by default, we jam all binaries to say they need all the mo files from their source package
ago 18 13:42:22 <pitti> sabdfl: right, we can create such a mapping in pkgstriptranslations
ago 18 13:42:30 <sabdfl>        but the developers could tweak that if its appropriate, in the small number of cases where one binary uses one mo file, and another uses a different one, all fomr the same source package
ago 18 13:42:31 <pitti> it knows the domain<->binary mapping
ago 18 13:42:33 <danilos>       for perfectionists, let them run strace on programs and let us know of any differences :)
ago 18 13:42:33 <carlos>        but we already have it in Rosetta....
ago 18 13:42:48 <sabdfl>        carlos: at the source package level, yes, not the binary package level
ago 18 13:43:05 <pitti> if we need a binary->domain mapping, let's create it in pkgstriptranslations
ago 18 13:43:06 <sabdfl>        so, by default, the best we can do is assume that any binary produced by a source package, needs all the mo files produced from the same source package
ago 18 13:43:35 <sabdfl>        then maintainers who care can tweak the packages so that binaries know *exactly* which subset of mo files they actually need
ago 18 13:43:41 <carlos>        sabdfl: there are some situations
ago 18 13:43:43 <danilos>       sabdfl: right
ago 18 13:43:45 <pitti> and then the client tool would download that mapping (tiny), then check out which domains it wants, and downloads the domain data
ago 18 13:43:50 <carlos>        when a package doesn't have any .po file
ago 18 13:43:58 <carlos>        but Rosetta have them
ago 18 13:44:03 <pitti> carlos: true
ago 18 13:44:03 <carlos>        what would happen in that situation?
ago 18 13:44:11 <sabdfl>        good point
ago 18 13:44:14 <carlos>        the package will think there aren't translations
ago 18 13:44:28 <sabdfl>        rosetta will need to tell the build system this information
ago 18 13:44:28 <carlos>        but Rosetta knows that there are such translations
ago 18 13:44:32 <pitti> oh, and binaries might do bindtextdomain() which is shipped by other debs
ago 18 13:44:43 <danilos>       pitti: yeah, iso_codes :)
ago 18 13:44:45 <pitti> well, let's do it by source package then
ago 18 13:44:47 <danilos>       pitti: among the others
ago 18 13:45:01 <pitti> we can always switch the implementation transparently to per-binary on the clients
ago 18 13:45:08 <carlos>        or php applications using .po files directly instead of .mo files...
ago 18 13:45:22 <sabdfl>        it should definitely be possible for a binary to say "i would use mo file X"
ago 18 13:45:43 <sabdfl>        and it should also require no work for maintainers, for it to "just work" by default, even if that results in slightly more mo files installed than needed
ago 18 13:45:45 <danilos>       sabdfl: yeah, that was my idea of postinst, but there's probably a better way to do it :)
ago 18 13:45:53 <sabdfl>        and then we can tweak the packaging to get the dependencies just right
ago 18 13:45:57 <pitti> sabdfl: at runtime, this is easy, but we need to figure it out statically
ago 18 13:46:16 <sabdfl>        pitti: how do you mean, statically?
ago 18 13:46:29 <pitti> sabdfl: i. e. determine the domains an executable wants without executing it
ago 18 13:46:35 <pitti> on the buildds, for example
ago 18 13:46:41 <danilos>       pitti: how hard would that be? if sourcepackage -> binarypackage1, binarypackage2, then domains for both binarypackages are same as source's ones
ago 18 13:47:02 <pitti> danilos: well, that's exaclty source package granularity, or am I missing something?
ago 18 13:47:15 <danilos>       pitti: this is the first approximation, that's what sabdfl is proposing afaigi
ago 18 13:47:30 <danilos>       and simply allow binary packages to be more specific
ago 18 13:47:33 <carlos>        danilos: mark wants to install only a .mo file if the binary package uses it, so if binarypackage2 doesn't use any, we don't install any .mo file
ago 18 13:47:43 <danilos>       i.e. you generate a static list like this, you can later modify it by hand
ago 18 13:47:49 <sabdfl>        pitti: well, the binary package would either:
ago 18 13:47:49 <sabdfl>         (a) have the mo files included, in which case we can strip them, and tell it to register its "dependency"
ago 18 13:47:49 <sabdfl>         (b) depend on another binary package, which provides them, which would have the same thing happen to it
ago 18 13:48:00 <danilos>       in most cases, you'd need no changes for most binary packages
ago 18 13:48:17 <pitti> sabdfl: unless we only have translations in Rosetta
ago 18 13:48:21 <pitti> sabdfl: i. e. for new languages
ago 18 13:48:46 <sabdfl>        pitti: if there are ANY translations, we register the domain name
ago 18 13:48:47 <danilos>       pitti: wouldn't you generate indexes and langpacks based on rosetta data anyway?
ago 18 13:48:52 <carlos>        well
ago 18 13:48:54 <carlos>        and KDE packages
ago 18 13:48:55 <pitti> sabdfl: yes, I agree it's a corner case
ago 18 13:49:02 <sabdfl>        then if we add translations for new languages, the system translation manager would see them and fetch them
ago 18 13:49:07 <carlos>        the binaries doesn't include any .po/.mo files
ago 18 13:49:13 <carlos>        they have their own language pack
ago 18 13:49:15 <pitti> carlos: right, KDE is spethial, too
ago 18 13:49:19 <sabdfl>        should we spec this in november?
ago 18 13:49:25 <sabdfl>        we know we won't get to it for edgy
ago 18 13:49:32 <pitti> sabdfl: november> yes, would be nice
ago 18 13:49:37 <sabdfl>        and we have 1.0  goals for rosetta
ago 18 13:49:44 <sabdfl>        ok, then i'm very happy we had this conversation
ago 18 13:49:48 <carlos>        sabdfl: yeah, I prefer to defer this until next meeting and talk a bit more face to face
ago 18 13:50:01 <pitti> if we all agree to do it right instead of do it now, then we should put more brain into this
ago 18 13:50:03 <sabdfl>        carlos, will you summarise for the lists, and update the spec to say we will talk about it at the whole company meet in november?
ago 18 13:50:08 <carlos>        I'm going to add this log to current spec
ago 18 13:50:12 <sabdfl>        pitti: agreed
ago 18 13:50:15 <carlos>        so we don't miss any of the points we did
ago 18 13:50:15 <danilos>       carlos: great, thanks
ago 18 13:50:22 <sabdfl>        thanks guys!
ago 18 13:50:27 <pitti> thank you, too
ago 18 13:50:33 <pitti> I'll let this mature a bit in my head
ago 18 13:50:36 <danilos>       yeah :)
ago 18 13:50:38 <danilos>       same here
ago 18 13:50:46 <carlos>        pitti: could you talk with mdz about this?
ago 18 13:50:48 <pitti> it usually needs a few iterations :)
ago 18 13:50:53 <carlos>        yeah
ago 18 13:50:58 <pitti> carlos: about the plan in general? sure
ago 18 13:51:07 <carlos>        and the defer of the spec
ago 18 13:51:21 <pitti> we also have our admins at the allhands meeting, we'll need them, too
ago 18 13:51:28 <carlos>        I will add again the restriction to import universe packages into Rosetta 
ago 18 13:52:02 <carlos>        and request Stuart to remove the entries we already have pending to be imported
ago 18 13:52:19 <pitti> carlos: what's wrong with importing universe translations at this point?
ago 18 13:52:35 <carlos>        pitti: that people will translate something that never will be used
ago 18 13:52:48 <danilos>       not exactly never, but not before edgy+1
ago 18 13:52:55 <carlos>        that's why we didn't import it for Dapper
ago 18 13:52:58 <pitti> well, not displaying it is a different point
ago 18 13:52:59 <danilos>       and some translations may "rot" until that happens
ago 18 13:53:03 <carlos>        danilos: but we prefer people translating main
ago 18 13:53:08 <danilos>       carlos: right
ago 18 13:53:19 <pitti> carlos: OTOH I assume you can import the existing universe later without new uploads
ago 18 13:53:23 <danilos>       carlos: and that would likely move serbian off the first position :)
ago 18 13:53:36 <carlos>        pitti: it would be a bunch of work in our side that I would prefer to save if we are not going to use them right now
ago 18 13:53:48 <carlos>        pitti: no, we cannot
ago 18 13:53:55 <carlos>        pitti: do you want it backported to dapper??
ago 18 13:54:11 <carlos>        danilos: ;-)
ago 18 13:54:17 <carlos>        pitti: or edgy?
ago 18 13:54:34 <danilos>       dapper likely, if it's really to be lts
ago 18 13:54:53 <carlos>        I think it changes a lot of things, we cannot add that to an stable release...
ago 18 13:55:13 <pitti> carlos: no, probably no backports
ago 18 13:55:13 <danilos>       hum, possibly
ago 18 13:55:20 <carlos>        pitti: then?
ago 18 13:55:20 <pitti> carlos: too intrusive changes on clients
ago 18 13:55:34 <carlos>        why would you want to import universe later?
ago 18 13:55:38 <pitti> new glibc, new backend, new update-notifier, etc.
ago 18 13:55:58 <danilos>       pitti: right, so lets forget about backporting
ago 18 13:56:02 <pitti> carlos: hm, true, just ignore me on this point
ago 18 13:56:05 <carlos>        ;-)
ago 18 13:56:52 <carlos>        ok, then do we agree on adding again the restriction for universe imports for Edgy  and remove anything already in the queue for Universe?
ago 18 13:57:09 <danilos>       carlos: yeah, I guess so
ago 18 13:57:22 <danilos>       how do I find out what packages I have installed from universe? :)
ago 18 13:57:23 <carlos>        I will remove it again when Edgy is released so we can import Edgy+1's universe
ago 18 13:58:06 <carlos>        in fact... I think I'm just going to block dapper and edgy's universe
ago 18 13:58:19 <carlos>        so nothing needs to change when we open edgy+1
ago 18 13:59:09 <carlos>        ok, thanks for all
ago 18 13:59:15 *       carlos updates the spec and prepares the email 
ago 18 13:59:43 <pitti> thanks, guys


CategorySpec

LanguagePacksForUniverse (last edited 2008-08-06 16:20:06 by localhost)