== Dev Week -- Fixing common ARM build failures -- janimo -- Thu, Jul 14th, 2011 == {{{#!irc [18:01] Hello everybody [18:01] I am Jani Monoses, a member of the Canonical ARM team and will speak about ARM related build failures in this session [18:02] feel free to ask questions as we go, on the -chat channel [18:03] As a reference and overview of today's discussion check out this wiki page https://wiki.ubuntu.com/ARM/FTBFS [18:03] ARM only recently got popular enough so that enough developers have them, but still hw is unavalable to most people [18:04] this is the reason why there are more build failures (FTBFS from now on) on ARM than on other architectures in Ubuntu [18:05] the situation has improved in the past cycles though, and none of these are very critical or unfixable, given developer attention [18:05] The failures are so common and recurring that there's even a weekly 'portin jam' on #linaro every Wednesday, to deal with failing packages for a few hours [18:06] http://qa.ubuntuwire.com/ftbfs/ [18:06] to get an idea of the number of failures, check the ARM column on this page [18:06] click around on some packages which are only red for ARM, and it will take you to the LP build log page [18:07] there you'll encounter the reason for the FTBFS and it very likely falls into one of the categories I'll go over now [18:07] and which are mentioned on the wiki page [18:08] The most innocent one, is that build server hardware has little RAM (<-=512M) so cannot cope with some packages without entering a swapstorm [18:08] these can be ignored as there's nothing we can do about them, short of waiting for the soon to be upgraded ARM build servers [18:09] There are then porting issues, much like there used to be from x86 to big-endian hw, or from 32bit to 64 bit [18:09] only there are different and a bit more varied [18:10] most problems and failing packages are C/C++ as that is where platform details are exposed, and are easy to overlook [18:11] for intance char on ARM is by default unsigned, whereas on x86 it is signed char, so such an assumption can lead to program failure at runtime [18:11] when warnings are treated as errors during compiling, many such differences are caught by gcc though [18:12] even if a bug would only manifest at runtime, the fact that many packages include a test-suite that is run as part of the build process means the build can fail on FAILed tests [18:12] segfaults ro assertions are an indication of such a bug [18:13] there are cases also where upstream just did not test on ARM, or made the build system work with x86/amd64 only [18:13] these should not be hard to fix either [18:13] Many of the current failures are for apps using Qt and OpenGL at the same time [18:14] ARM platforms do not have hw accelerated OpenGL drivers, only accelerated GLES (which is a subset of modern GL) [18:14] so on ARM we make Qt use GLES as its OpenGL rendering backend (Qt does rely on GL for some accelerated rendering, which is transparent to the app developer) [18:15] but Qt also lets the developer use GL directly and provides a Qt surface to render onto [18:15] when this is used and the app contains Qt code and explicit GL API calls it will break on ARM because of GLES and GL headers conflict [18:16] these are not easy to fix, and usually need upstream to port their code to GLES in addition to desktop GL [18:16] Another Qt gotcha that is not uncommon is the use of the type qreal which is a typedef for a floating point type [18:17] on x86 this is a double but on ARM it is floart [18:17] code that treats 'qreal' and 'double' interchangeably will likely not build on ARM, so some explicit casts or rethinking of the types used is needed [18:18] this is simple for plain C code, but Qt - and especially bindings - use autogenerated code which can also rely on this assumption, so one may need to dig deeper in the Qt tools and bindings to fix a certain app [18:19] Sometimes to expose different APIs some libs have slightly different symbols exported on ARM. So debian symbol files may need adjustment and customization from time to time, when upstream did not test ARM [18:20] A family of failures which are luckily getting fewer are ARM architecture incompatibilities [18:20] Ubuntu builds for ARMv7 , currently the most modern variant of the ARM architecture [18:21] For a while, since Debian defaulted to ARMv5, an older but still very widespread variant, some issues were apparent only on Ubuntu [18:21] but now with most mobile devices using ARMv7, and hw availability in the form of devel boards, upstream updates their build systems, and ifdefs in the code to include ARMv7 too [18:22] still if you find a package that FTBFS because it does not check for armv7 (but say only armv5) in it configure scripts, it should be a straightforward fi [18:22] x [18:23] Many of the failures, and the hardest to fix - as it requires toolchain expertise - are those caused by gcc/binutils bugs [18:23] the tools evolve fast and thus sometimes regressions occur [18:24] the package may fail due to a gcc ICE (internal compiler error) or worse have bad code generated and fail in the tests [18:24] or even worse successfully build but then cause weird segfaults in other unrelated packages, especially if it is a widely used library [18:26] when you have one of the above scenarios, try rebuilding with a more mature version of gcc (gcc-4.4 or 4.5 currently) or without optimization (-O0) to see if it is indeed a new gcc optimization regression [18:26] then if so, pass it on the Linaro toolchain developers :) [18:26] I'll take a minute or two to see if there are questions [18:28] Or we can look randomly at any of the bugs listed on the failures page and see if it indeed fits the categories above or I lied [18:28] thank you, what a terrific audience :) [18:29] QUESTION: are there good reasons to export different symbols on arm [18:30] micahg, good question. I think these are generally consequences of bugs (so accidental) or upstream need to do this because of some other libs being used on ARM [18:30] and so it exports a new API to reflect that backend [18:30] I did not encounter these often [18:31] IIRC clutter had such a case but cannot think of another offhand [18:32] now that micahg asked I remember there's another tricky case of failures [18:32] that of apps which generate native code or that use extensive asm code [18:33] the latter need porting and fixing for the ARM variant we use (removing use of deprecated ARMv5 instructions, constraints on register usage) but should be straightforward [18:33] the former though, which have their own JIT (mono, chromium, llvm) can be hard to debug [18:33] and are usually upstream work [18:34] the interaction and failure between gcc we use (which can have a regression), the one they used, and the generated JIT code needs very good knowledge of said project codebase [18:35] and does not fit any of the more generic categories above [18:35] micahg, I see in the next channel that you just wanted to ask this, nice :) [18:36] and when the situation is so complex, one cannot be sure if it is a toolchain issue, or upstream issue or more likely a combination. The JIT featuring apps, tend to be complex in many other ways too [18:36] So it is not surprising that such bugs are the most long-lived and when they get fixed it is not always clear why they went away [18:37] Mono used to be very broken till natty, until NCommander and upstream managed to fix it === sean is now known as Guest89478 [18:37] chromium keeps failing too [18:38] and Java, which is the ultimate JIT-based project is so broken that there is no suitable open source and fast enough JVM === Guest89478 is now known as Secris [18:39] but the vast majority of build failures are like other bugs, not too exciting and matching some patterns [18:39] but they can only be worked on effectively if people have ARM hardware [18:39] QUESTION: is there a guide for the register code cleanup? [18:40] another question from micahg . You mean what I mentioned above - register constraints? [18:40] gcc usually says something like r13 cannot be used here [18:41] or r7 or whatever. Which means that for the target ABI you are building that register is used by gcc [18:41] so I just fixed the 2 or 3 such cases by replacing with an obviously unused register and testing it work. [18:42] I do not have a link, but googling for ARM EABI resered registers or something like that should give the answers [18:43] ARM, like x86 has reserved names for special purpose registers (stack pointer, program counter, frame pointer), but confusingly those can also be referenced by their generic names like r12, r15, r13 or similar [18:43] so the asm code using those may not obviously be using a reserved register for general purpose computation. [18:45] gladk> QUESTION:Â What ARM-hardware do you usually use and can recommend? [18:45] I and most of the ARM team use TI pandaboards for development [18:46] as they are fast enough. But any ARMv7 that you can afford should be good. Other vendors are starting to offer <200$ devel boards [18:47] I am happy with the panda, but did not use something else extensively to recommend. There are also toshiba ac100 netbooks which have some ubuntu images for SD floating around, those to are ok for building [18:47] although they only have 512M of RAM compared to the panda's 1G [18:48] micahg has https://www.genesi-usa.com/store/details/12 which seems to do a decent job [18:48] for hw related question feel free to pop in #ubuntu-arm, there are many people with a variety of hw there, and you may get better answers [18:49] With many ARM tablets and netbook appearing and being rooted, chances are that Ubuntu is going to find its way on many of them [18:51] There are 10 minutes remaining in the current session. [18:51] While certainly not falling under the scratching their own itch category, devs can help with ARM FTBFS without actually fixing or owning hw but by bug triaging [18:52] we have over 100 failures and the wiki page describes how those can be easier to manage and keep at bay [18:52] Close invalid bugs: Some bugs may get out of date if they are filed automatically on FTBFS but then are forgotten and not closed when a new build succeeds. [18:52] Check the issue with Debian/upstreams and forward upstream or link to upstream bugtracker patches [18:52] Tag them for easier retrieval: they all have the ftbfs and arm-porting-queue tags, but there can be other good ways to tag (arm-build-timeout , qt-opengl-arm, etc) [18:53] although I honestly don't see why one would do such things enthusiastically if not owning ARM hw :) [18:54] Thanks for the questions so far, any others? [18:55] There are 5 minutes remaining in the current session. [18:59] cheers, and thanks for reading. [19:00] That includes those reading the irclogs later :) }}}