Devil505
Diego
eugeni
fabiolone
Giacomo
Ingo
Jonathan
kiddo
Linux-Planet
Linuxindetails
Scurz
shredder12
teguh
TForsman
theclimber
yohoI’ve been pushing the tinderbox one notch stricter from time to time; a few weeks ago I set up the tinderbox so that any network access beside for the basic protocols (HTTP, HTTPS, FTP and RSYNC) was denied; the idea is that if the ebuilds try to access network by themselves, something is wrong: once the files are fetched, that should be enough. Incidentally, this is why live ebuilds should not be in the tree.
Now, since I’ve received a request regarding the actual network traffic issued by the tinderbox, I decided to go one step further still, and make sure that beside for the tasks that do require network access the tinderbox does not connect to anything outside of the local network. To do so, I set up a local RSync mirror, then added a squid passthrough proxy, that does not cache anything; at that point, rather than allowing some protocols on the router for the tinderbox, I simply reject anything originating from the tinderbox to access Internet; all the outgoing connections originating from the tinderbox are done through Yamato, so I have something like this in my make.conf:
FETCHCOMMAND="/usr/bin/curl --location --proxy yamato.local:3128 --output \${DISTDIR}/\${FILE} \${URI}"
RESUMECOMMAND="/usr/bin/curl --location --proxy yamato.local:3128 --continue-at - --output \${DISTDIR}/\${FILE} \${URI}"Note: googling on how to set up those two variables in Gentoo to use curl I did find some descriptions on the Gentoo Forums that provide most of them; unfortunately all I found ignore the --location option, which makes it fail to fetch stuff from the SourceForge mirrors and any other mirroring system that uses 302 Moved responses.
I also modified the bti-calling script so that the identi.ca dents are sent properly through the proxy. I didn’t set the http_proxy variable, because that would have made moot the sealing. Instead, by setting it up this way, explicitly for the fetch and dent, if any testsuite tries to fetch something, even via HTTP, will be denied.
But… why should it be a problem if testsuites were to access services on the network? Well, the answer is actually easy once you understand two rules of Gentoo: what is not in package.mask is supposed to work, and any bug found needs to be fixable, and testsuites results need to be reproducible, to make sure that the package works. When you rely on external infrastructure like GIT repositories, you have no way to make sure that if there is a problem it can be fixed; and when your testsuite relies on remote network services, it might fail because of connection problems, and it will fail if the remote service is closed entirely.
I’ve also been tempted to remove IPv4 connectivity from the tinderbox at all; IPv6 should well be enough given that it only needs to connect to Yamato, and it would be under NAT anyway..
FSWS, standing for a pretty unimaginative “Flameeyes’s Static Web Site”, is simply a bunch of XSL template files that I have developed for my own website to start with and now serves a couple others of a few friends of mine. It is a close relative to what me and Darren worked on for the xine website but it’s not derived directly; it’s rather that I took decisions based on my experience with that, and with the non-generic templates I used for my website before that.
Now, I have written before of my searching for a license for the framework a few months back, and I’ve not yet found something that works for me. I asked Matija for help but I ended without the time to mail him the details and.. what the heck, why should I mail just him the details of what I’m looking for? Isn’t this what the “social” web is all about?
So again, let’s see the specifics about this: I want FSWS to be Free Software, by any standard, and I want it to be copyleft as well: if you modify it, I’d like to see the improvements, and make use of them, since if I modify it, my changes are available to everybody to begin with. This takes out the options of the simplest licenses, such as CreativeCommons, or MIT. Ideally, GPL should do the trick, but as we all should know, it’s a license that works best with non-Web software; for the “new world” of Web, even FSF created a new license, the AGPL; I’ve used AGPL-3 before, for my rbot plugins (the bugzilla one is used by the willikins bot that is on the #gentoo channels on Freenode).
But is AGPL-3 a good choice as it is? Nope, probably not. As I noted in that previous post, using AGPL-3 would make the generated website licensed under AGPL-3 as well, since it’s template we’re talking about. As you can guess, it’s not going to make it any fun to use a similarly-licensed system. And this is something that FSF knows themselves; the equivalent “old world Unix” situation is autoconf and its M4 macro files, for which they created a licensing exception that happens to be more or less what I need for my own code.. of course, the exception as it is, is not really general enough to apply to my use case.
In general, there are just a few points I need to make sure are respected:
So now, what should I do for licensing this work, and publishing it? I want to get it right the first time, rather than deal with the fallout of bad decisions later!
I’ve ranted about overlays leaving notes about Gnome overlay I had to fight with that because of Rygel which reportedly needed the new (testing) version of Gtk+.
Now, my interest in Rygel is so that we can rid of MediaTomb in Portage; I added it myself, when I tried to make use of my PlayStation 3 for streaming video (mostly anime and Real Time with Bill Maher). It actually never worked as well as I itnended for very long; it needed proper transcode support, and what was there was incomplete. Also, the code itself was messy and hacky, with commented-out code still there, and bundled libraries. When I replaced my Samsung TV with a Sony Bravia last year, I was also hoping MediaTomb worked with that (because it actually supports UPnP by itself alone), but that wasn’t the case.
At any rate, with MediaTomb failing to keep up with releases, cleanup code, and so on so forth, I gave up vastly on the UPnP idea; even using the XBMC instance on my AppleTV, the best seems to be using Samba instead. This until a couple of weeks ago when I started worrying about my bedroom’s media outlets. I have three UPnP-enabled devices (Bravia TV, PlayStation 3 and XBox 360), but I use an always-on AppleTV to play my stuff; that really wounds like a waste.
Even more so given that the AppleTV doesn’t really play Full HD content, not with XBMC at least; and my hopes for it to become useful, actually trusting in Apple’s declaration that they would have brought TV series to the iTunes Store in Europe vanished quite a long time ago. And to reinforce the fact that I made a totally shitty deal when I bought this AppleTV, rumors have that the new version of it will be a totally different product, cheaper and with no on-board storage… now I can guess the reason for that; as I said I stream my video from my main storage (Yamato itself), but I actually am glad that the AppleTV I have has 160GB storage so I can keep a copy of my photos, and of all the music I have, in lossless format (ALAC).
At any rate, I wanted to give UPnP/PlayStation 3 another chance; and the current way to do that is using Rygel, developed by the Gnome community and tied to GStreamer (even though I have a number of personal reserves against it). Now, thankfully, most of the needed code was already around in Portage, and Petteri had a partial ebuild for an older Rygel version, so I spent a night trying to work it out. It needs the GUPnP stack that is developed together with it obviously, and it relies on Vala for a big part of it including the GUI.
The "stable’ version is from quite a long time ago; and if you know software enough, you know that “stable” means “unmaintained” when its release version ends with a “.0”. So I went for the development series, 0.7. And updating the dependencies, it turned out to need Gtk+ 2.21, with all the related trouble. Funnily, Arun notified me that the configure.ac script actually outright lies, it requires 2.21 just because it can, but it does not need it, and works fine with 2.20; I haven’t had much time to update the ebuilds so that they ignore the dependency, but I was able to test it for a little while with 2.21.
I’m sincerely not excessively impressed with it; of course it works definitely better than MediaTomb, and I guess that UPnP/DLNA are messed up enough that they have trouble to actually get them working properly, but… it seems like for the European version of the Bravia TV they always transcode to mpeg2/mp3 (which I’ll tell you, is crappy quality; the TV can do DVB-T HD out of the box, I guess it has support for decoding H.264/AAC), and even the PlayStation3 has trouble identifying some files, even when they should play correctly on local storage; and PS3 is declared to be their platform of choice.
The interface itself is quite difficult to work on, it has no way to monitor the scan status; it also only index files if the extension is one of those they recognize and…. funnily enough they recognize .mp4 files but not .m4v files, which are just the same thing; rename the latter to the former, and voiltà, it works… so you got a possible bug there. I haven’t reported it but on IRC, where I was suggested to check the config files that… are quite a bit messed up.
I’ll fix up in tree some ebuilds for Rygel at some point this weekend if I can get the time to, it’s still a pretty good replacement for MediaTomb, it’s just something I’d probably use rather than XBMC-over-Samba.
Of course this is still not solving my problem with video playback; especially since it does not work with softsub that are overly common with J-Drama…
After my ranting about QA – which brought back our lead from hiding, at least for a while, let’s see how he fares this time – a number of people asked me what they could do to help QA improve. Given that we want both to test and prevent, here comes a list of desirable features and requests that could obviously be helpful:
repoman right now does not warn about network errors in HOMEPAGE or SRC_URI; see this bug as for why; it would be a good thing to have;repoman, lacking network checks, lack a way to make sure that the herd specified in metadata.xml exists at all; it’s pretty important to do that since it happened before that herds weren’t updated at all;metadata.xml lists maintainers that have accounts in Bugzilla; with proxy-maintainership sometime it happens that either the email address is not properly updated (proxier fail) or the maintainer does not have an account on Bugzilla at all (which is against the rules!); unfortunately this requires login-access to bugzilla, and is thus a no-no right now;files/ subdirectories are still a problem for the tree, especially since they are synced down to all users and should, thus, be kept slim, if not for the critical packages; as it turns out, between Samba and PostgreSQL they use over half a megabyte for files that a lot of people wouldn’t be using at all; we also had trouble with people not understanding that the 20KB size limit on files does not mean you can split it in two 10KB files, nor compress it to keep it in tree (beside the fact we’re using CVS, there is a separate patch repository for that; Robin is working on providing a reliable patch tarball hosting as well); see bug #274853 and bug #274868stricter feature – but using them system-wide is unfeasible (I’m okay with fixing my own stuff, but I don’t want an unusable system because some other developer didn’t fix his).In general, I’m not an expert sysadmin; my work usually involves development rather than administration, but as many other distribution developers, I had to learn system administration to make sure that the packages do work on the users’ systems. This gets even messier when you deal with Gentoo and its almost infinite amount of combinations.
At any rate, I end up administering not only my local systems, but also two servers (thanks to IOS Solutions who provides xine with its own server for the site and bugzilla). I started using lighttpd, but for a long series of circumstances I ended up moving to Apache (mostly content negotiation things). I had to learn the hard way about a number of issues — luckily security was never involved.
My setup moved from a single lighttp instance to one Apache that kept running two static websites, one Bugzilla and one Typo instances, to two Apache on two servers, one running a static website and a Bugzilla instance, the other running a few static websites and a Typo instance via passenger. The latter is more or less what I have now.
From one side, Midas is keeping up the xine website (static, generated via XSLT after commit and push); from the other, Vanguard – the one I pay for – keeps this blog my website and a few more running. I used to have a gitweb instance (and Gitarella before that), but I stopped providing the git repositories myself, much easier to push them to Gitorious or GitHub as needed.
The static websites use my own generator for which I still have to find a proper license. Most of these sites are mine or simply of friends of mine, but with things changing a bit for me, I’m going to start offering that as a service package for my paying customers (you have no idea how many customers would just be interested in having a simple, static page updated once in a few months… as long as it looked cool).
But since I have, from time to time, to stop Apache to make changes to my blog – or in the past Passenger went crazy and Apache stopped from answering to requests at all – I’m not very convinced about running the two alongside for much longer. I’ve then decided it was a good idea to start figuring out an alternative approach; the solution I’m thinking of requires the use of two Apache instances on the same machine; since I cannot use different ports for them (well, I could run my blog over 443/SSL but I don’t think that would be that good of an idea for the read-only situation), I’ve now requested a second IP address (the current vserver solution I’m renting should support up to 4), and I’ll run two instances with that.. over the two different IP addresses.
Now, one of the nice things of splitting the two instances this way is that I don’t even need ModSecurity on the instance that only serves the static sites; while they are not really as static as a stone (I make use of content negotiation to support multiple languages on the same site, and mod_rewrite to set the forcing), there is no way that I can think of that any security issue is triggered while serving them. I could even use something different from Apache to serve them, but the few advanced features I make use of don’t make it easy to switch (content negotiation is one, another is rewritemaps to recover moved/broken URLs). And obviously, I wouldn’t need Passenger either.
But all the other modules? Well, those I’d need; and since by default they are actually shared modules (I have ranted about that last November), loading two copies of them means duplicating the .data.rel and the other Copy on Write sections. Not nice. So I finally bit the bullet and, knowing that Apache upstream allows using them as builtin, I set out to find if the Gentoo packaging allows for that situation. Indeed it does, but by mishandling the static USE flag which made it quite harder to find out. After enabling that one, disabling the mem_cache, file_cache and cache modules (that are not loaded by default but are still built, and would be built-in when using the static USE flag), and restarting Apache, the process map looked much better, as now the apache2 processes have quite less files open (and thus a much neater memory map).
One thing that is interesting to note: right now, I’ve not been using mod_perl for Bugzilla because of the configuration trouble; one day I might actually try that. Possibly with a second Apache instance on Midas, open only on SSL, with a CACert certificate.
Now it might very well be possible that you were to need a particular module only in one case, such as mod_ssl to run a separate process for an SSL-enabled Apache 2 instance… in that case, one possible solution, even though not extremely nice, is to use the EXTRA_ECONF trick that I already described.. in this case, you could create a /etc/portage/env/www-servers/apache file with this content:
export EXTRA_ECONF="${EXTRA_ECONF} --enable-ssl=shared"On a separate note, I think one of the reasons why our developers let the default be dynamic modules is more related to the psychology of calling it “shared”. It makes it sound like it’s wasting memory when you have multiple processes using a “non-shared” module.. when in reality you’re creating much more private memory mappings with the shared version. Oh well.
Unfortunately, as it happens, the init system we have in place does not allow for more than one Apache system to be running; it really requires different configuration files and probably a new init script, so I’ll have to come back to this stuff in the next days, for the remaining parts.
There are though three almost completely unrelated notes that I want to sneak in:
sshd sessions, and I can’t find any usefulness in them for most desktop systems, or single-user managed servers (I definitely don’t leave a motd to myself).www or some other third-level domain: almost no DNS service seem to actually allow for CNAME to be used on the origin record (that is, the second-level domain); this means that you end up with the two-levels domain to point directly to an IP, and changing a lot of those is not a fun task, if you’re switching the hosting from one server to another.So a week or so ago I masked Webmin, because it dinged a lot on my tinderbox and I decided to take a look at it. The ChangeLog shows a very bleak story: webmin hasn’t had a dedicated maintainer since March 2008; that’s two and a half years ago; the last five versions were bumped by Patrick (who, if you should be reminded, rarely picks up the pieces of what he might break); last October, Victor “fixed” a bunch of sandbox violations in the 1.490 version.
Given the hits on the tinderbox and the above-noted history; I decided to mask it until somebody actually stepped up to maintain it; this was also positively acknowledged by the security team: having a package with the security issue of Webmin lacking a dedicated maintainer is calling for trouble.
Since then I received a long list of email messages of users using Webmin, either in production or for local environments, wondering why I masked it and insisting that it is in its design to access files owned by other packages anyway. I guess I’ll have to rephrase the masking reason given that, that wasn’t what I meant.
It is a common, enforced policy in Gentoo that all the packages are self-contained, and mess the least possible with the running system as they can, so that if you just emerge MySQL, it won’t start up right away, or even at the next reboot; this is one thing that put Gentoo drastically in contrast with most other distributions. And what a lot of people, me included, love the most.
If there are commands to execute to properly setup the package before making use of it, we have the special pkg_config() function that can be called through emerge --config — there aren’t many packages doing that though.
Instead, webmin-1.510.ebuild goes the totally wrong way: it tries to second-guess the user’s setup; it runs the setup step of Webmin directly within src_install() which should be protected by the sandbox; but Victor’s “fixes” actually simply allowed the ebuild to access what it shouldn’t be, not at that state at least: real devices, kernel modules, and the cron configuration.
How did I notice that? Very simple, actually: the tinderbox, just like my own system, uses fcron as cron daemon. The fcron configuration file is not predicted within the ebuild, so it still triggered sandbox violation, and caused my tinderbox to start screaming at me like my mother when I come back at 6am (don’t get me started).
Now, if that was the only problem it wouldn’t be much; but the ebuild also fail to actually provide a decent/stable PAM configuration, and a lot of the shell code in the ebuild file is just… icky.
What can be done to save webmin? Well, more than certainly it needs a new maintainer, one who does a lot more than copying a file and running cvs add on it. The new maintainer will have to rewrite most of the ebuild, implement the good parts of it as pkg_config and make sure that it respects the user’s settings.
For most other packages, like I did for Ruby packages, I would have said that I’m happy to be hired to take care of the problematic package, with a reasonable fee to keep maintaining it from then on. But given it’s Webmin we’re talking about.. I’m not sure if I’m ready to pick it up. If somebody else actually uses it, and feels like maintaining it, it’d be best. I’m still open to be the proxying maintainer if somebody feels like pick up the pieces of it all, but wants to have somebody else reviewing the ebuild before it get pushed to users.
You probably know how I feel about Autotools books given I actually wanted to write one, was mostly rejected, and ended up working on spare time on Autotools Mythbuster — which I really should write more on. Definitely the “Autobook” needed a rehash, that was obvious. The spot was taken not by me but instead by John Calcote, with his Autotools: A Practical Guide To GNU Autoconf, Automake, And Libtool — okay I’m a bit envious of that, but not excessively.
At any rate, No Starch Press has been tremendously kind and offered me a review copy – I did contact them to propose my guide as a book, and they did tell me they had a book already in the works.. John’s – which I accepted gladly. I actually hoped to post the review much sooner, but between job tasks, and Gentoo messes, time was a rare commodity. And to be honest, it felt a lot like reading a textbook when you know already the subject. Don’t get my tone wrong. I find John’s work very stimulating: for a newcomer to the world of build systems, it definitely goes in much deeper detail than most documents I’ve seen before; he also does something that I strive for myself: described all the related topics: make, autoconf, automake and libtool, plus a number of related, but not directly-involved, topics.
For instance, he goes into two important topics that I have only written about in the blog up until now — Position Independent Code and library versioning — and then provides a whole chapter dedicated to tip and tricks, not only tied to Autotools. Some of the tricks were new also to me, and I’ve been floating in this topic for the best part of the past seven years (even before I joined as a Gentoo developer). Some further insights also show that he’s not tied to the Unix world itself, which is always a positive thing, as a lot of time we self-validate ourselves and can’t think outside that limited box.
Given these property, I couldn’t find any reason not to list it on the further readings section of my own guide.
I could stop here with the review, and possibly make John and No Starch pretty happy, but I’d feel like I “sold” myself for the hope to actually keep a good contact with them as a publisher. But it wouldn’t be right for them either; I owe them my full opinion. Thus, I have now to move a few personal grudges I have with John’s approach, without reducing the sheer importance that this book has for all of us developers of Free Software that work with autotools daily: he stuck too much with the point of view of the Autotools’ upstream maintainers.
Don’t get me wrong, it’s obvious that diverting from the authors’ intention for any kind of software is bad, but I have fiercly criticized them before about their failure to get to agree to what should be used together. They insist that autoconf and libtool can be used standalone, without automake at all, they provide a number of support macros, but not all those you’d be expecting them to, and so on so forth. All these things actually make me pretty sad, and that’s why I’m trying my best on my text to actually write of a complete build system, using all three of them, plus pkg-config, which John mentions just in passing in the book, and that, as usual I’ll add, feels like a bastard child of the rest of the projects.
It actually pours more than just at that level; while he acknowledges the (must read) Recursive Make Considered Harmful by Peter Miller, he then states (I quote) “[T]he sheer simplicity of implementing and maintaining a recursive build system makes it, by far, the most widely used form of build system.” – Well, I cannot dispute the fact that it is the most widely used form; I can definitely argue a lot about the “simplicity”, especially considering that recursive build systems have atrocious support for parallelisation, and with modern machines growing more in number of cores (or execution threads) than they do in pure speed, that is an important detail to consider; especially for big projects.
Another thing that baffled me is that John is able to describe the Autoconf Archive without moving a comment about the policies regarding bundled and non-bundled macros in autoconf … add to that the way he dismisses most of the mistakes in the official documentation, or the idiosincratic behaviour of some macros, and that is what I call “siding with the authors” (by itself, of course, there is nothing wrong; he’s much more objective than I could ever be I guess). For those who want to take a laugh now, John’s book refers to the Autoconf Archive website; the Autoconf Archive website then links back to Autotools Mythbuster. Heh.
It can be opinable I guess, but for me, despide its subtitle defines it as a “Practical Guide”, I find it a good theoretical textbook; but I disagree that you should be using Autotools the way they are describe by John, or by the official documentation for what matters; like it or not, a lot of the technical decisions in those projects are taken after a political stance, and that shows on some recommendations that to me only hinder development and adoption of the tools. And in the current landscape where cmake is still preferred to build under Windows, and, luckily for all of us, scons is finally disappearing, dropping the politics, and simply provide a good pragmatic approach to practice is what I was hoping for.
Final words and sound bite/quotable opinion? John’s is an important book, as I already noted. It shows a stirring ecosystem of people working around Autotools; even though we disagree on views and in some technical details, his is the only current and complete text on the tools of the trade. If you’re a newcomer who want to know how it works behind the scenes in detail, you just have to read it. But if you’re interested in writing a good build system by modern standard, you cannot just stop here. John is showing you the door; but you’ve got to walk through it and proceed to the end of the corridor… deciding which further doors to peek into, and which ones to keep shut.
One unfortunately still too common practice I find in my fellow developers is to rely too much on overlays to get users involved; my personal preference on this matter is getting people to proxy-maintain packages, and the reason is that this way I can make sure the fixes propagate to all the other users in a timely matter, as well as being able to intercept mistakes before I commit them to the tree.
But there are other reasons why I dislike overlays; for instance, they often clash enough with each other, or mix should-be-working packages with don’t-even-try … which is the case of the current Gnome overlay. I used to use the Gnome overlay so I could test and help reporting bugs before the next release hit ~arch; unfortunately since a while ago now, the overlay contains Gnome3/Gtk+3 packages that really shouldn’t be mixed in on a system that is actually used.
This became obnoxious to me the moment I went to actually try Rygel (so that I could actually get rid of MediaTomb, if it worked and I could add it to the tree — that code is noxious!). The problem is not in the high reliance of Rygel on Vala, that would be good enough, given that we have it in tree; the problem is that the Rygel UI (and after trying it out I can safely say that you don’t want to try without the UI) requires either Gtk+3 (no way!) or Gtk+ 2.21… which is the “devel” branch and is present only on the gnome overlay. Not even masked in tree.
It wouldn’t have been too bad, if it wasn’t that upstream (finally!) split gdk-pixbuf from gtk+ itself, so you should finally be able to use librsvg without X11 on the system (which is why my charts are available only as SVG and cannot be seen by some browsers who have trouble displaying embedded SVG). Unfortunately, this also means that they changed the path gdk-pixbuf uses to load the loaders (no pun intended); and the current ~arch librsvg won’t pick that up. Again, the librsvg in the overlay has automagic-deps trouble, and require both Gtk+2 and Gtk+3 to be present to work. D’oh!
This is nothing new, what is the problem? Well, ostensibly beside the fact that Arun blogs about something we can’t have ;) — not your fault, I know, not picking on you, don’t worry Arun!
The problem comes when I’ve asked before why the Gnome stuff is not pushed in main tree under p.mask, like most other teams work, especially given that I can make use of the tinderbox to check reverse dependencies before it’s unleashed, rather than have to report them afterwards. Indeed, Gtk+ and other libraries’ updates tend to be quite boring because there is way too much software that define GTK_NO_DEPRECATED and similar, which should only be used during development, and thus fail when stuff they us do get deprecated. Of course even if they didn’t define that, the code would be failing at the following update when they get removed, but that’s beside the point now.
Interestingly enough, though, the effect of (at least some) more recent deprecation seem to be causing the same kind of issue (if, by the mere fact we’re talking about Gtk+, to a lesser severity) of the recent glibc-2.12 release in the form of undefined symbols where GTK_* macros are used.
As you might suspect the tinderbox already stumbled across a few of these packages; while the quick-fix is generally quick (drop the NO_DEPRECATED definition), the complete fix (use the correct non-deprecated API) takes a while, and I can’t blame the maintainers for waiting to hear from upstream on that matter, especially given the way gtk+ is always dropped like a bombshell. Just to be on the safe side, I’ve now added some further tests to ensure that neither the “symbols” requirements caused by gtk+-2.20 nor those caused by glibc-2.12 will be left standing without further notice. If the tinderbox will ever build such a broken package, it’ll be reporting it to me so that I can file the proper bug.
Now, the gtk+-2.21 situation seem to start just as well; gtkhtml fails to build and it’s even part of Gnome itself. I will be begging the Gnome team again, starting from here, for them to add the ebuilds as soon as they are usable in main tree, under p.mask, so that the tinderbox can start churning at them.
But since people seem to think I write too much negatively, I have to say that at least a few developers seem to actually keep in mind there is the tinderbox available; Samuli, Alexis and Jeroen asked for feedback before unmasking (XFCE, Ocaml and libevent-2 respectively), and the problems found have been taken care of much more quickly then even I expected, for the most part. So if you’re a package maintainer and want the reverse dependencies of your package tested before unmasking a version into ~arch, just drop me a mail and I’ll set up a special run. It can take anything between half a day to a week or two depending on the size of the revdep tree and the queued up runs (right now it’s completing a full-system-set rebuild to see if there were more issues with glibc-2.12; turns out I only hit another problem and that was related to GNU make 3.82 instead (another “good” scary bump).
After this post, you can guess the following run is going to target gtk+-2.20. If there are no further runs, after that it’ll resume the daily-build of the tree.
Please keep in mind though: package.mask-ed packages are fine, but the tinderbox will not test any overlay. Get your fixes in the tree proper!
I’ve recently ranted about the fact that GLIBC 2.12 was added to portage in a timeframe that, in both my personal and professional opinion, was too short. I’ve not dug too much into what the problems with that particular version, and why I think it needed to stay in package.mask for a little longer.
The main reason why we’re seeing failures is because the developers have been, for a few versions already, cleaning up the headers so that including only one of them won’t cause a number of other, near-unrelated headers to be included. This is the same problem as I described with -O0 and libintl.h almost two years ago.
Now, on FreeBSD, and Mac OS X by reflection, a number of these cleanups have been done a long time ago, or the problem was never introduced: they both try to stick to the minimal subset of interfaces you need to bring in. This is why half the time what you have to do to port something to FreeBSD is just adding a bunch of header inclusions. This should mean the whole situation is easily handled, and should already be fixed in most situations, but that’s far from the case. Not only, for no good reason at all, a number of projects protect the inclusion of extra headers with #ifdef calls for FreeBSD and Mac OS X, but one of the particular headers cleaned up causes a much worse problem than the usual “foo not defined” or constants not found.
The problem is that the stat.h inclusion has been dropped by a number of headers, which means it has to be added back to the code itself. Unfortunately, since C allows for implicit declarations (Portage does warn of them!), this means that a call of S_ISDIR(mymode) and similar will appear to the compiler like an implicit function declaration, and will then emit a requirement for the (undefined) symbol S_ISDIR… which of course is not a function but a macro defined in the header file. Again, this is not as troublesome as it looks: the linker will catch the symbol as undefined and halt the build, in the best of cases. And just to make sure, Portage logs these problem further, stating that they can create problems at runtime — of course that would be more useful if developers actually paid attention and fixed them, but let’s not go there for now.
The real problem come when this kind of mistake is present in shared objects; by default ld will allow shared objects to have undefined reference, especially if they are plugins (since then the symbols may be resolved by the host program, if not by a library that they can be linked to). Of course, most sane libraries will be using --no-undefined for the final link… but not all of them do so. A common problem situation is Ruby and its extensions – at least outside of Gentoo, given that all our current Ruby ebuilds force --no-undefined at extension link time. The only warning you have there is the implicit-declaration of the fake-function, but that’s far from being difficult to oversee.
And before you suggest that, no -Werror-implicit-declaration is not going to be a good idea: most of the autoconf tests fail if that is passed, which result in a non-buildable system; that’s why it’s one of the few flags I play with on all my autoconf-based projects!
Also, as Ruby itself proved – ask me again why I always rant when I write about Ruby – there are ways around the warning that don’t constitute a fix even for the most optimistic of the developers.
But then, yuo have the million euro question: how severe is this bug? The only way to judge that is to understand what symptoms it causes. Given that’s not a security feature, you’re not going to have security bugs here, but you have:
stat(2) handling code!There is a good thing though: -Wl,-z,now which is part of the default settings for hardened profiles, or that can be set at runtime by setting LD_BIND_NOW=1 in the environment, will ensure that all the symbols are bound when starting up the process, rather than lazily while the code executes; it can reduce the risk of the missing symbol to be hit in the middle of the transaction. Unfortunately it does not work the same way with extensions for languages like Ruby and Python, but at least alleviate a bit the problem.
Combine what I just wrote with the fact that even a package part of the system set (m4, used by autoconf, not something you can live without) failed to build with this glibc version by the time it went unmasked, showing a lack of system rebuild on the system of the developer choosing to unmask it, and you might guess why I ended up using such strong words in my previous post.