Dev Week -- Fixing common ARM build failures -- janimo -- Thu, Jul 14th, 2011

   1 [18:01] <janimo> Hello everybody
   2 [18:01] <janimo> I am Jani Monoses, a member of the Canonical ARM team and will speak about ARM related build failures in this session
   3 [18:02] <janimo> feel free to ask questions as we go, on the -chat channel
   4 [18:03] <janimo> As a reference and overview of today's discussion check out this wiki page
   5 [18:03] <janimo> ARM only recently got popular enough so that enough developers have them, but still hw is  unavalable to most people
   6 [18:04] <janimo> this is the reason why there are more build failures (FTBFS from now on) on ARM than on other architectures in Ubuntu
   7 [18:05] <janimo> the situation has improved in the past cycles though, and none of these are very critical or unfixable, given developer attention
   8 [18:05] <janimo> The failures are so common and recurring that there's even a weekly 'portin jam' on #linaro every Wednesday, to deal with failing packages for a few hours
   9 [18:06] <janimo>
  10 [18:06] <janimo> to get an idea of the number of failures, check the ARM column on this page
  11 [18:06] <janimo> click around on some packages which are only red for ARM, and it will take you to the LP build log page
  12 [18:07] <janimo> there you'll encounter the reason for the FTBFS and it very likely falls into one of the categories I'll go over now
  13 [18:07] <janimo> and which are mentioned on the wiki page
  14 [18:08] <janimo> The most innocent one, is that build server hardware has little RAM (<-=512M) so cannot cope with some packages without entering a swapstorm
  15 [18:08] <janimo> these can be ignored as there's nothing we can do about them, short of waiting for the soon to be upgraded ARM build servers
  16 [18:09] <janimo> There are then porting issues, much like there used to be from x86 to big-endian hw, or from 32bit to 64 bit
  17 [18:09] <janimo> only there are different and a bit more varied
  18 [18:10] <janimo> most problems and failing packages are C/C++ as that is where platform details are exposed, and are easy to overlook
  19 [18:11] <janimo> for intance char on ARM is by default unsigned, whereas on x86 it is signed char, so such an assumption can lead to program failure at runtime
  20 [18:11] <janimo> when warnings are treated as errors during compiling, many such differences are caught by gcc though
  21 [18:12] <janimo> even if a bug would only manifest at runtime, the fact that many packages include a test-suite that is run as part of the build process means the build can fail on FAILed tests
  22 [18:12] <janimo> segfaults ro assertions are an indication of such a bug
  23 [18:13] <janimo> there are cases also where upstream just did not test on ARM, or made the build system work with x86/amd64 only
  24 [18:13] <janimo> these should not be hard to fix either
  25 [18:13] <janimo> Many of the current failures are for apps using Qt and OpenGL at the same time
  26 [18:14] <janimo> ARM platforms do not have hw accelerated OpenGL drivers, only accelerated GLES (which is a subset of modern GL)
  27 [18:14] <janimo> so on ARM we make Qt use GLES as its OpenGL rendering backend (Qt does rely on GL for some accelerated rendering, which is transparent to the app developer)
  28 [18:15] <janimo> but Qt also lets the developer use GL directly and provides a Qt surface to render onto
  29 [18:15] <janimo> when this is used and the app contains Qt code and explicit GL API calls it will break on ARM because of GLES and GL headers conflict
  30 [18:16] <janimo> these are not easy to fix, and usually need upstream to port their code to GLES in addition to desktop GL
  31 [18:16] <janimo> Another Qt gotcha that is not uncommon is the use of the type qreal which is a typedef for a floating point type
  32 [18:17] <janimo> on x86 this is a double but on ARM it is floart
  33 [18:17] <janimo> code that treats 'qreal' and 'double' interchangeably will likely not build on ARM, so some explicit casts or rethinking of the types used is needed
  34 [18:18] <janimo> this is simple for plain C code, but Qt - and especially bindings - use autogenerated code which can also rely on this assumption, so one may need to dig deeper in the Qt tools and bindings to fix a certain app
  35 [18:19] <janimo> Sometimes to expose different APIs some libs have slightly different symbols exported on ARM. So debian symbol files may need adjustment and customization from time to time, when upstream did not test ARM
  36 [18:20] <janimo> A family of failures which are luckily getting fewer are ARM architecture incompatibilities
  37 [18:20] <janimo> Ubuntu builds for ARMv7 , currently the most modern variant of the ARM architecture
  38 [18:21] <janimo> For a while, since Debian defaulted to ARMv5, an older but still very widespread variant, some issues were apparent only on Ubuntu
  39 [18:21] <janimo> but now with most mobile devices using ARMv7, and hw availability in the form of devel boards, upstream updates their build systems, and ifdefs in the code to include ARMv7 too
  40 [18:22] <janimo> still if you find a package that FTBFS because it does not check for armv7 (but say only armv5) in it configure scripts, it should be a straightforward fi
  41 [18:22] <janimo> x
  42 [18:23] <janimo> Many of the failures, and the hardest to fix - as it requires toolchain expertise - are those caused by gcc/binutils bugs
  43 [18:23] <janimo> the tools evolve fast and thus sometimes regressions occur
  44 [18:24] <janimo> the package may fail due to a gcc ICE (internal compiler error) or worse have bad code generated and fail in the tests
  45 [18:24] <janimo> or even worse successfully build but then cause weird segfaults in other unrelated packages, especially if it is a widely used library
  46 [18:26] <janimo> when you have one of the above scenarios, try rebuilding with a more mature version of gcc (gcc-4.4 or 4.5 currently) or without optimization (-O0) to see if it is indeed a new gcc optimization regression
  47 [18:26] <janimo> then if so, pass it on the Linaro toolchain developers :)
  48 [18:26] <janimo> I'll take a minute or two to see if there are questions
  49 [18:28] <janimo> Or we can look randomly at any of the bugs listed on the failures page and see if it indeed fits the categories above or I lied
  50 [18:28] <janimo> thank you, what a terrific audience :)
  51 [18:29] <janimo> QUESTION: are there good reasons to export different symbols on arm
  52 [18:30] <janimo> micahg, good question. I think these are generally consequences of bugs (so accidental) or upstream need to do this because of some other libs being used on ARM
  53 [18:30] <janimo> and so it exports a new API to reflect that backend
  54 [18:30] <janimo> I did not encounter these often
  55 [18:31] <janimo> IIRC clutter had such a case but cannot think of another offhand
  56 [18:32] <janimo> now that micahg asked I remember there's another tricky case of failures
  57 [18:32] <janimo> that of apps which generate native code or that use extensive asm code
  58 [18:33] <janimo> the latter need porting and fixing for the ARM variant we use (removing use of deprecated ARMv5 instructions, constraints on register usage) but should be straightforward
  59 [18:33] <janimo> the former though, which have their own JIT (mono, chromium, llvm) can be hard to debug
  60 [18:33] <janimo> and are usually upstream work
  61 [18:34] <janimo> the interaction and failure between gcc we use (which can have a regression), the one they used, and the generated JIT code needs very good knowledge of said project codebase
  62 [18:35] <janimo> and does not fit any of the more generic categories above
  63 [18:35] <janimo> micahg, I see in the next channel that you just wanted to ask this, nice :)
  64 [18:36] <janimo> and when the situation is so complex, one cannot be sure if it is a toolchain issue, or upstream issue or more likely a combination. The JIT featuring apps, tend to be complex in many other ways too
  65 [18:36] <janimo> So it is not surprising that such bugs are the most long-lived and when they get fixed it is not always clear why they went away
  66 [18:37] <janimo> Mono used to be very broken till natty, until NCommander and upstream managed to fix it
  67 === sean is now known as Guest89478
  68 [18:37] <janimo> chromium keeps failing too
  69 [18:38] <janimo> and Java, which is the ultimate JIT-based project is so broken that there is no suitable open source and fast enough JVM
  70 === Guest89478 is now known as Secris
  71 [18:39] <janimo> but the vast majority of build failures are like other bugs, not too exciting and matching some patterns
  72 [18:39] <janimo> but they can only be worked on effectively if people have ARM hardware
  73 [18:39] <janimo> QUESTION: is there a guide for the register code cleanup?
  74 [18:40] <janimo> another question from micahg . You mean what I mentioned above - register constraints?
  75 [18:40] <janimo> gcc usually says something like r13 cannot be used here
  76 [18:41] <janimo> or r7 or whatever. Which means that for the target ABI you are building that register is used by gcc
  77 [18:41] <janimo> so I just fixed the 2 or 3 such cases by replacing with an obviously unused register and testing it work.
  78 [18:42] <janimo> I do not have a link, but googling for ARM EABI resered registers or something like that should give the answers
  79 [18:43] <janimo> ARM, like x86 has reserved names for special purpose registers (stack pointer, program counter, frame pointer), but confusingly those can also be referenced by their generic names like r12, r15, r13 or similar
  80 [18:43] <janimo> so the asm code using those may not obviously be using a reserved register for general purpose computation.
  81 [18:45] <janimo> gladk> QUESTION:Â What ARM-hardware do you usually use and can recommend?
  82 [18:45] <janimo> I and most of the ARM team use TI pandaboards for development
  83 [18:46] <janimo> as they are fast enough. But any ARMv7 that you can afford should be good. Other vendors are starting to offer <200$ devel boards
  84 [18:47] <janimo> I am happy with the panda, but did not use something else extensively to recommend. There are also toshiba ac100 netbooks which have some ubuntu images for SD floating around, those to are ok for building
  85 [18:47] <janimo> although they only have 512M of RAM compared to the panda's 1G
  86 [18:48] <janimo>  micahg has which seems to do a decent job
  87 [18:48] <janimo> for hw related question feel free to pop in #ubuntu-arm, there are many people with a variety of hw there, and you may get better answers
  88 [18:49] <janimo> With many ARM tablets and netbook appearing and being rooted, chances are that Ubuntu is going to find its way on many of them
  89 [18:51] <ClassBot> There are 10 minutes remaining in the current session.
  90 [18:51] <janimo> While certainly not falling under the scratching their own itch category, devs can help with ARM FTBFS without actually fixing or owning hw but by bug triaging
  91 [18:52] <janimo> we have over 100 failures and the wiki page describes how those can be easier to manage and keep at bay
  92 [18:52] <janimo> Close invalid bugs: Some bugs may get out of date if they are filed automatically on FTBFS but then are forgotten and not closed when a new build succeeds.
  93 [18:52] <janimo> Check the issue with Debian/upstreams and forward upstream or link to upstream bugtracker patches
  94 [18:52] <janimo> Tag them for easier retrieval: they all have the ftbfs and arm-porting-queue tags, but there can be other good ways to tag (arm-build-timeout , qt-opengl-arm, etc)
  95 [18:53] <janimo> although I honestly don't see why one would do such things enthusiastically if not owning ARM hw :)
  96 [18:54] <janimo> Thanks for the questions so far, any others?
  97 [18:55] <ClassBot> There are 5 minutes remaining in the current session.
  98 [18:59] <janimo> cheers, and thanks for reading.
  99 [19:00] <janimo> That includes those reading the irclogs later :)

MeetingLogs/devweek1107/FixingARMFTBFS (last edited 2011-07-15 08:09:29 by dholbach)