Dev Week -- Fun with python-apt -- mvo -- Fri, Jan 23


[17:03] <mvo> hello
[17:04] <mvo> I think I'm next :)
[17:05] <mvo> the next topic is "fun with python-apt"
[17:05] <mvo> is there anyone interessted in montioring #ubuntu-classroom-chat for questions while I talk and paste them there ?
[17:06] <dinxter> yep... oops :)
[17:06] <mvo> aha, cool. I will just start then - you can ask questions anytime :)
[17:06] <mvo> python-apt exposes most of the functionatliy that apt-get/libapt
[17:06] <mvo> provide via python.
[17:06] <mvo> so everything that apt-get/synaptic and friends can do can be done via python-apt
[17:07] <mvo> lets start with a example to see how scary it really is :)
[17:07] <mvo> Here is a simple example:
[17:07] <mvo> #!/usr/bin/python
[17:07] <mvo> import apt
[17:07] <mvo> cache = apt.Cache()
[17:07] <mvo> pkg = cache["apt"]
[17:07] <mvo> print "name: ",
[17:07] <mvo> print "installed version: ", pkg.installedVersion
[17:07] <mvo> (feel free to copy/paste that into a text file and run to see what it outputs)
[17:08] <mvo> This example illustrates the central piece of the libapt system. The
[17:08] <mvo> package cache (availalbe as apt.Cache()). It is a fast representation
[17:08] <mvo> of the data available in /var/lib/apt/lists and is created as a
[17:08] <mvo> mmap-able binary file in /var/cache/apt/pkgcache.bin. It contains
[17:08] <mvo> information about all the available binary packages and
[17:08] <mvo> dependencies. In the python version it behaves like a dict.
[17:08] <mvo> The other piece is the package object. It contains general information
[17:08] <mvo> like name and description, version information and information about
[17:08] <mvo> the status (installed, downloadable, ..). It can also be used to
[17:08] <mvo> manipulate the state of the package.
[17:09] <mvo> Documentation is available via: "pydoc apt", "pydoc apt.Cache", "pydoc
[17:09] <mvo> apt.Package", etc. There is also a (very new and good) website
[17:09] <mvo> available here:
[17:09] <mvo>
[17:10] <mvo> any questions so far?
[17:11] <mvo> package can be in a lot of different states
[17:11] <mvo> Here is a example to inspect package states:
[17:11] <mvo> #!/usr/bin/python
[17:11] <mvo> import apt
[17:11] <mvo> cache = apt.Cache()
[17:11] <mvo> print "available packages: ", len(cache)
[17:11] <mvo> for pkg in cache:
[17:11] <mvo>     if pkg.isInstalled:
[17:11] <mvo>        print "installed: ",
[17:11] <mvo> print "num installed: ", len([pkg for pkg in cache if pkg.isInstalled])
[17:12] <mvo> If I'm going too fast (or too slow) please shout :)
[17:12] <mvo> This shows that the cache can be manipulated in a similar way like a
[17:12] <mvo> regular dict, it should feel "python-ish" :)
[17:14] <mvo> this example again shows that the cache is the central structure, it contains all the package object that can be used to get information about the system
[17:15] <mvo> QUESTION: what is the difference between Package.installedSize and Package.installedPackageSize?
[17:15] <mvo> the package size is the size of the download (the compressed package itself)
[17:16] <mvo> the installedSize is the actual size on disk
[17:16] <mvo> of the unpacked thing
[17:16] <mvo> it may be not 100% accurate because the package may create files in its postinst or other scripts
[17:16] <mvo> but generally its good enough
[17:17] <mvo> Lets do anohter example:
[17:17] <mvo> #!/usr/bin/python
[17:17] <mvo> import apt
[17:17] <mvo> progress = apt.progress.OpTextProgress()
[17:17] <mvo> cache = apt.Cache(progress)
[17:17] <mvo> print "summary: ", cache["apt"].summary
[17:17] <mvo> This shows another important concept. Progress objects. Everytime apt
[17:17] <mvo> does something that takes a long time (like building the caches,
[17:17] <mvo> fetching stuff from the network or installing packages) it can use
[17:17] <mvo> progress objects to give user feedback. There are some text based
[17:17] <mvo> progress objects available that can be readily used.
[17:19] <mvo> those progress object are regular python classes so they can be used as the basis to make fanncier progress information available
[17:19] <mvo> gtk or qt progress would be a example
[17:19] <mvo> currently there is text based progress available in the lib, but the jaunty version already has some gtk based progress widgets available as well
[17:19] <mvo> that should make intergration even easier
[17:20] <mvo> Here is a example how to create your own progress objects:
[17:20] <mvo> #!/usr/bin/python
[17:20] <mvo> import apt
[17:20] <mvo> class MyOpProgress(apt.progress.OpProgress):
[17:20] <mvo>         " example class that demonstrates how a progress can be subclassed "
[17:20] <mvo>   def update(self, percent):
[17:20] <mvo>           print "update: ", percent
[17:20] <mvo>   def done(self):
[17:20] <mvo>           print "done!"
[17:20] <mvo> progress = MyOpProgress()
[17:20] <mvo> cache = apt.Cache(progress)
[17:20] <mvo> print "summary: ", cache["apt"].summary
[17:20] <mvo> This creates a new class based on the normal progress class that
[17:20] <mvo> overwrites the the functions called by apt.
[17:21] <mvo> information about the methods in the progress code is avialable via "pydoc apt.progress"
[17:21] <mvo> there is also a lot of example code in update-manager or gdebi (big users of pyhton-apt :)
[17:22] <mvo> all good so far? examples working :) ?
[17:24] <mvo> The python-apt system is not limited to querrying but it can also
[17:24] <mvo> perform actions. Here is a example (that needs to run as root) of a
[17:24] <mvo> apt-get update like call:
[17:24] <mvo> #!/usr/bin/python
[17:24] <mvo> import apt
[17:24] <mvo> fetchprogress = apt.progress.TextFetchProgress()
[17:24] <mvo> cache = apt.Cache()
[17:24] <mvo> cache.update(fetchprogress)
[17:24] <mvo>
[17:24] <mvo> Its important to re-open the cache after it was updated (this will
[17:24] <mvo> eventually become part of update() api I think so that it happens
[17:24] <mvo> automatically).
[17:25] <mvo> while the api is pretty solid there are some small warts still (like this open() is required after update())
[17:25] <mvo> we are working on it :)
[17:26] <mvo> fetchprogress can again be your own class, it just has some more methods that can be overwriten
[17:26] <mvo> We can of course install stuff too:
[17:26] <mvo> #!/usr/bin/python
[17:26] <mvo> import apt
[17:26] <mvo> fetchprogress = apt.progress.TextFetchProgress()
[17:26] <mvo> installprogress = apt.progress.InstallProgress()
[17:26] <mvo> cache = apt.Cache()
[17:27] <mvo> cache["2vcard"].markInstall()
[17:27] <mvo> cache.commit(fetchprogress, installprogress)
[17:27] <mvo>
[17:28] <mvo> the open() is important again so that apt knows about the chanages. you could omit it and save time is the cache is not needed afterwards
[17:28] <mvo> e.g. because your applicaiton exists after it installed something
[17:29] <mvo> (the last line should be - thanks to maxb)
[17:30] <mvo> any change done in the cache is just "simulated" until commit() is called
[17:31] <mvo> by default python-apt will ensure that everything is consistent, i.e. if you install a package in the cache that requires dependencies, it will automatically mark them for install in the cache as well
[17:31] <mvo> This lets us do multiple install/removes too, it will commit
[17:31] <mvo> whatever is currently set in the cache. So its a good idea to inspect
[17:31] <mvo> the cache before we commit changes. This is done like this:
[17:31] <mvo> #!/usr/bin/python
[17:31] <mvo> import apt
[17:31] <mvo> cache = apt.Cache()
[17:31] <mvo> cache["4g8"].markInstall()
[17:31] <mvo> for pkg in cache.getChanges():
[17:31] <mvo>     if pkg.markedInstall:
[17:31] <mvo>        print "installing: ",
[17:31] <mvo>     elif pkg.markedUpgrade:
[17:32] <mvo>        print "upgrade: ",
[17:32] <mvo>     elif pkg.markedDelete:
[17:32] <mvo>        print "removing: ",
[17:32] <mvo> See the "pydoc apt.package" documentation for all the available
[17:32] <mvo> states.
[17:33] <mvo> python-apt gives you full control over the consistency if you want, you can e.g. prevent it from automatically fixing problem and do that yourself, but its recommended to let it run in autoFix mode :)
[17:34] <mvo> states like automatic dependencies (for apt-get autoremove) will also be done correctly and a commit() will do the right thing and write them out
[17:35] <mvo> Each package has two versions. The installed version (if any) and the
[17:35] <mvo> candidate version. Those can be querried via:
[17:35] <mvo> #!/usr/bin/python
[17:35] <mvo> import apt
[17:35] <mvo> cache = apt.Cache()
[17:35] <mvo> for pkg in cache:
[17:35] <mvo>     if pkg.isUpgradable:
[17:35] <mvo>        print "pkg '%s': now '%s' candidate: '%s'" % (, pkg.installedVersion, pkg.candidateVersion)
[17:35] <mvo> Each package can of course have multiple versions, but apt will show you two by default
[17:35] <mvo> the candidate is the one that you can install
[17:35] <mvo> and the installed is self explaining :)
[17:36] <mvo> the candidate version can be manipulated with the apt policy and by setting it directly, but that again is not recommended, usually its doing the right thing when it calculates the candidate
[17:36] <mvo> usually the candidate is just the higest available downloadable version
[17:37] <mvo> One nice feature of python-apt is that you can give it a alternative
[17:37] <mvo> rootdir to look for status file and sources.list. If you want to know
[17:37] <mvo> what warty loooked like (back in the days) you could do:
[17:37] <mvo> #!/usr/bin/python
[17:37] <mvo> import apt
[17:37] <mvo> import os
[17:37] <mvo> import os.path
[17:37] <mvo> alt_root = "/tmp/warty"
[17:37] <mvo> for d in ["/etc/apt/",
[17:37] <mvo>             "/var/lib/apt/lists/partial",
[17:37] <mvo>             "/var/lib/dpkg",
[17:37] <mvo>     "/var/cache/apt/archives/partial"]:
[17:37] <mvo>     os.makedirs(alt_root+d)
[17:38] <mvo> open(alt_root+"/var/lib/dpkg/status","w")
[17:38] <mvo> open(alt_root+"/etc/apt/sources.list","w").write("deb warty main")
[17:38] <mvo> c=apt.Cache(rootdir=alt_root)
[17:38] <mvo> c.update(apt.progress.TextFetchProgress())
[17:38] <mvo>
[17:38] <mvo> print "warty apt version: ", c["apt"].candidateVersion
[17:38] <mvo> in this example first a bit of directory structure is created (that should probably go into the lib itself) and then the alternative rootdir is created
[17:38] <mvo> the cache then reads the information from there instead of the main system
[17:39] <mvo> this is handy in some situations, e.g. to debug stuff or to calculate differences in the packages between multiple version or for history interesst or or .)
[17:41] <mvo> you could do e.g. a daily "whats new available in jaunty" thing with it while still runing hardy etc
[17:41] <mvo> you may have noticed that the apt.package.Package object does not expose everything there is in a package record
[17:42] <mvo> (the package record is the stuff that is output via apt-cache show package)
[17:42] <mvo> e.g. apt-cache show apt has a task field
[17:42] <mvo> that is not exposed there
[17:42] <mvo> or the bugs field
[17:42] <mvo> the reason is that in the mmap struture only the most important information is represented
[17:42] <mvo> for performance reasons
[17:43] <mvo> but the information is still available, its just a little bit slower when checked :)
[17:43] <mvo> (and a lot slower if you querry every package with the full record instead of the mmap-ed struture)
[17:43] <mvo> The package object does not expose all fields of a package, just the
[17:43] <mvo> most important ones. To access the others, there is the package
[17:43] <mvo> "Record". It contains the full content of the Packages entry for the
[17:43] <mvo> particular package. It can be accessed as a dict or as a string. Here
[17:43] <mvo> is a example:
[17:43] <mvo> #!/usr/bin/python
[17:43] <mvo> import apt
[17:43] <mvo> cache = apt.Cache()
[17:44] <mvo> pkg = cache["synaptic"]
[17:44] <mvo> print "Bugs entry: ", pkg.candidateRecord["Bugs"]
[17:44] <mvo> print "Full record: ", str(pkg.candidateRecord)
[17:45] <mvo> This covers the most important aspects of the python-apt high level
[17:45] <mvo> interface.
[17:45] <mvo> more examples (of mixed quality :P are availalbe in the python-apt examples folder:
[17:45] <mvo> /usr/share/doc/python-apt/examples/
[17:45] <mvo> There is also the aptsources interface to manipulate the sources.list
[17:45] <mvo> and the low level apt_pkg and apt_inst interfaces that provide a
[17:45] <mvo> direct mapping of the underlaying C++ interface of libapt to python
[17:45] <mvo> (and its not very python-ish :). apt_pkg/apt_inst is used to implement
[17:45] <mvo> the higher level apt interface.
[17:46] <mvo> apt_pkg is nowdays no longer needed for most tasks, the apt interface covers most of it
[17:46] <mvo> apt_pkg does not follow PEP08 (the python code style guidelines) and is generally a bit "rough"
[17:46] <mvo> so only use it if you really need it :)
[17:46] <mvo> otherwise use "import apt"
[17:47] <mvo> Developement power is always welcome! Projects that use python-apt are
[17:47] <mvo> update-manager, gnome-app-install, gdebi, software-properties and
[17:47] <mvo> more. If you are interessted in hacking on python-apt (or any of the
[17:47] <mvo> tools that use it) you are very welcome to do so. The code is
[17:47] <mvo> maintained in bzr.
[17:47] <mvo> that was my overview about python-apt - any questions?
[17:47] <mvo> or suggestions?
[17:50] <mvo> if there are none, thank you very much for your attention and I hope you have enjoyed it :)
[17:52] <jbernard> thanks alot, very helpful
[17:52] <mvo> let me answer the packagekit question first
[17:53] <mvo> packagekit is currently in universe, we plan to move it to main for jaunty, but we can not use it for everything
[17:53] <mvo> the problem is basicly that packagekit does not allow anything like debconf during installation
[17:53] <mvo> that is fine most of the time
[17:53] <mvo> but there are packages that require some form of interaction, otherwise they fail to install
[17:54] <mvo> thing of virtualbox (at least that used to need it)
[17:54] <mvo> or java
[17:54] <mvo> or stuff like this
[17:54] <mvo> so we will use it for some stuff were we know its save to use
[17:54] <mvo> but not for e.g. update-manager where it maybe any package (including those that require debconf)
[17:55] <mvo> but packagekit is really something on top of python-apt, its a layer higher
[17:55] <mvo> (in fact, the packagekit backend for apt is implemented using python-apt :)
[17:55] <mvo> QUESTION: i take it getting the apt.Cache() locks the cache file? if so how do i release it for other apps
[17:56] <mvo> threre are the "apt_pkg.PkgSystemLock() and apt_pkg.PkgSystemUnLock()" calls for this
[17:56] <mvo> if python-apt is used as non-root then no locking is needed, that is the safest option :)
[17:57] <mvo> gdebi for example only switches to root for the actual installing, everything else is done as the user
[18:01] <mvo> a last note. a pretty nice overview over the apt system is here:

MeetingLogs/devweek0901/PythonApt (last edited 2009-01-23 19:25:00 by ausimage)