Devil505
Diego
eugeni
fabiolone
Giacomo
Ingo
Jonathan
kiddo
Linux-Planet
Linuxindetails
Scurz
shredder12
theclimber
yohoOr see this on YouTube — and yes this is quite ironic that we’ve uploaded to YouTube the visualised history of a streaming software stack.
The video you can see here is the GIT history of feng the RTSP streaming server me and Luca are working on for the LScube project, previously founded by the Turin Politechnic, visualised through Gource .
While this video has some insights about feng itself, which I’ll discuss on the mailing list of the project soon enough, I’m using this to bring home another point, one even more important I think. You probably remember my problems with ATI HD4350 video card … well, one of the reasons why I didn’t post this video before, even though Gource has been in tree (thanks to Enrico) for a while already, is that it didn’t work too well on my system.
It might not be too obvious, but the way Gource work is by using SDL (and thus OpenGL) to render the visualisation to screen and to (PPM) images – the video is then produced by FFmpeg that takes the sequence of PPM and encodes it in H.264 with x264; no I’m not going to do this with Theora – so you rely on your OpenGL support to produce good results. When 0.24 was being worked on (last January) the r700 Radeon driver, with KMS, had some trouble, and you’d see a full orange or purple frame from time to time, resulting in a not-too-appealing video. Yesterday I bit the bullet, and after dmesg has shown me a request from the kernel to update my userland, I rebuilt the Radeon driver from GIT, and Mesa from the 7.8 branch…
Perfect!
No crashes, no artefacts on glxgears, and no artefacts on Gource either! As you can see from the video above. This is with kernel 2.6.33 vanilla, Mesa 7.8 GIT and Radeon GIT, all with KMS enabled (and the framebuffers work as well!). Kudos to Dave, and all the developers working on Radeon, this is what I call good Free Software!
Just do yourself a favour, and don’t buy videocards with fans… leaving alone nVidia’s screwup with the drivers, all of them failed on me at some point, passive cards instead seem to work much longer, probably because of the lack of moving parts.
Since my public explaining of my issues with Gentoo I haven’t seen much change sin the air; even though I think there is enough of a consensus among users and developer alike that staying “bleeding edge” does not have to force us to replace “unstable” with “unusable”, there is the (usual, I got to say) inertia for changing things around. While I won’t be revealing anything posted on the private gentoo-core mailing list, I would like to note that neither of the two developers I blamed, both privately and publicly, decided to respond to the criticism. And actually the latter kept on doing what he was doing before: “fixing” stuff he’s not using nor testing.
And of course, QA hasn’t been moving to address the problem, nor did devrel as far as I can see. Does this give me any more trust in Gentoo? No, it actually saps it away, and is moving me day after day toward the edge of really give the finger to the whole lot. While I haven’t decided yet what to do, it really makes me wonder if it’s worth the time I’m pouring into this. Having the tinderbox churn away, mostly at my private expenses, while somebody decides to wreak havoc throughout the tree just to get things more “bleeding” doesn’t really appeal to me that much, even with all the users suggesting me to keep doing my work so that it doesn’t get worse. And for those who wonder, no I’m definitely not going to work along with Robbins, unless he’d be paying me to do that, and a lot as well; simply because I don’t think we could ever agree with what QA means: while the Gentoo Ruby team was pouring work for Ruby 1.9, Robbins thought that the idea was to just make Ruby 1.9 the default. Great move indeed.
And for those who wonder, yep, both developer A and B are easy to spot, since I named the packages they touched explicitly. My intention was not to take cheap shots at them, but rather a way to avoid smearing them; not with users, that in my opinion should be warned about the developers they entrust with their systems, but rather with search engines. If I were to name the developers explicitly, it might very well be that a search for their names in Google would turn up my posts; I’m pretty sure that possible future employers might not like what I wrote too much. Since I trust they might actually be better people and developers in their place of employment, I’d rather not add extra risks to them. You know I usually don’t have trouble naming names, if you follow me for long enough.
But before speaking more about leaving, let me try to dissect one further problem with the Gentoo ecosystem as it is now. We all know that we have problems with a shortage of developers. Indeed, most teams are understaffed, and that’s why we’re no longer bleeding edge. For this reason, quite a bit of people seems to be upset at the idea of getting developers out of the pool, for the pure matter of reducing the number of committers. While I could go on arguing about the quality-over-quantity topic (having developer B touching stuff he has no clue about is not good for the health of Gentoo), I have to accept that for some things, quantity is important: we cannot restrict working on ebuilds to people that use them if we don’t have enough people that use the stuff in the tree, which is why we have to accept at least some partial compromise on that ground.
So, why do we have this shortage of developers? We sure have no shortage of users, nor of ebuild submitters, nor of overlays to “breed” the developers. Well, there are in my opinion two main issues with the “be a developer” process. The first is that we have too few recruiters, the guys who actually make the developers; the second is that we have no real way to train those developers at all. These problems sit square with DevRel and QA respectively.
For what concerns recruiters, I’m going to state one thing that I think I stated before, but not sure if I stated it publicly enough: there is just too much red tape. There, I said it. With all due respect to the late Ferris, he was a great developer for the SPARC team, but his own background (being a lawyer) added up to the DevRel team so much that it really seem to be bureaucracy on par with most government agencies. I have no doubt that this is also why Bryan (kloeri) decided to go a different route with Exherbo (and I name another “taboo”; no I’m definitely not even going to consider it — again it’s a social thing, while I’m probably going to agree on a few points, some of its developers have positively engaged on very personal insults against me, and I would never entrust them even with a calculator). In the particular case of recruiters, one of the requirements to become one is to review your quizzes again; I’ll be on to the quizzes in a moment, but for now let’s just put in evidence that while it might be helpful for some developers to review those quizzes (trust me, developer B broken at least one or two of the questions in them, repeatedly, without taking a hit), it is definitely something that bothers you quite a bit.
So back to the quizzes: they currently stop both current developers to become recruiters and new developers to be created. What’s the problem with these quizzes? Well the first is that they are in DevRel’s hands. That’s a problem because DevRel’s task should be of handling developers’ relations and not technical issues. The Council has given the task of handling technical issues with the tree to the QA project instead, together with assigning the task of documenting such policies; why are the quizzes still written by DevRel then? And especially, does DevRel and QA really speak about those quizzes?
The answer here is easy if I’m allowed to be blunt: QA does not have the cojones to deal with that stuff. And that includes partly me as well. The quizzes as they are now are not really that good, not because the recruiters (Petteri in particular) haven’t been working hard on them, but rather because they come out of the wrong place. Let me name a few particular problems with them:
Documentation is the keyword on all of this. Lack of documentation obviously means no way to inform the new developers of what they should or shouldn’t do. The current documentation is scattered among the website, and that makes it very difficult to consult. The devmanual that Ciaran started was supposed to remove that trouble; unfortunately it hasn’t been touched vastly for years (by me as well, to be honest — there is still a very old PAM documentation entry that I wrote myself and submitted to Ciaran at the time, which is nowadays vastly outdated), and it contains too much “reference” and too little explaining for the new stuff (I can say lots of bad stuff about Ciaran, but I remember he starting explaining stuff rather than just adding references). Besides, it contains quite a bit of objectionable suggestions that are frown upon by the current QA team and by quite a few developers.
In the current form, by the way, the DevManual is in my opinion unmanageable. Sorry Tim, but while I’m not a huge fan of RST myself, what we have now is an abomination: it uses the basic GuideXML syntax, with a custom stylesheet, but also a lot of custom markup! The repository has something like over 130 files named text.xml (I don’t know yours, but my editor of choice, Emacs, shows me the base file name when switching between them, and with all of them having the same name is almost impossible to find the right one right away!) with the diagrams being diagram.svg, all the pages use “index.html” as final name. I won’t make it a mystery that I don’t like it at all; I actually started to work – with Equilibrium, an Italian user/AT – on a DocBook-based manual, porting the content, still in XML, to DocBook 5: no markup syntax extension was needed, it actually allowed a much more flexible grammar for the content, and it uses a standard syntax, not some concoction that looks similar to a Gentoo-specific XML format. We’re still going to ask contributors to learn an XML markup language; should we ask them to learn something that is not used at all outside Gentoo, or something that is being used in real life?
Now, let me try to show you the problems with a few of the questions in our quizzes, which is something I’m afraid people, especially in DevRel, will frown upon.
15. You find a package that will not build on some architectures without PIC (-fPIC) code in the shared libraries. What is the proper way to handle this situation?
This question is very deeply technical, I wrote about PIC many times so I definitely agree this is something we should pay attention to. On the other hand, I’m pretty sure that the existence of this question in the quizzes predates the introduction of AMD64 as a daily-basis architecture; I’m pretty sure the question was there when I first took the quizzes about five years ago. The reason why I say so is that it was indeed something very difficult to grasp for people that only ever looked at x86 and that very rarely would have the chance to deal with the problem at hand; Position Independent Code in shared libraries was, five years ago, a topic well-known just for Alpha and a few other niche architectures; then AMD64 became mainstream, and PIC is now the norm. The amount of projects trying to build non-PIC shared objects also reached almost zero (for the new stuff) for that very same reason. By the way, this can be helped to solve with the help of the PIC fixing guide (which is part of the Hardened project, neither DevRel, nor QA).
6.e
# Use an alternative implementation instead of the default depending # on the foo use flag. DEPEND="foo? ( cat-foo/alternative ) : ( cat-foo/default )"
Yet another question older than me as a developer… this one deals with a common construct used in programming languages called “ternary operator” (: ? in C and similar languages). Many people have suspected that the reason for this question is simply to make sure developers knowing other languages won’t be confused by that operator; in truth, that’s not the case. If you look at this devmanual page under the latest paragraph “Legacy Inverse USE-Conditional Dependency Syntax” you’ll see this format described as a legacy, no longer valid form. Since when is it not valid? I cannot tell you for sure but it was already not valid and discouraged over five years ago when I took the tests and is not supported by any modern implementation of ebuild-based package managers. The very fact of keeping this documented and questioned about makes no sense at all!
8. Why are ‘head -5’ and ‘tail -5’ bad? What should be used instead?
Another question that makes sense only in the sense of history. The use of head -5 and tail -5 has been proactively discouraged for a while, as you can also note by the presence of fixheadtails.eclass that fixes those instances; while this is indeed correct, in the sense that POSIX specifies a different syntax for head and tail, this wasn’t much of a problem until GNU coreutils implemented a warning about the old syntax, which caused so much noise that developers explicitly went and converted all the usage of those two commands, both in ebuilds and in the scripts used by upstream. That warning went away, and while it’s still a good idea to stick to POSIX-compatible syntax, the quiz seems to stress the importance only on head and tail. Instead the most commonly misused command is likely to be find (as the GNU syntax, the FreeBSD syntax and the POSIX syntax are quite different one from the other), with sed being hardwired to GNU sed as the other implementations are much different.
I took three examples of obsoleted, outdated questions, which might as well be changed or removed altogether if I asked to. On the other hand, I’d like to propose an alternative approach to the whole situation. Given that the quizzes are almost always submitted together nowadays, I’d propose merging the technical questions all together, and split out the organisational questions: you get an organisational and an ebuild quiz. Further points if the former is written by DevRel, and the latter by QA. Complete with (recruiter-private?) cheatsheet of the important things that the recruits need to note for each answer.
Even better, let’s try to fit into the whole system what a lot of other teams have been doing, including the Sunrise team: they are now requesting for just the (“easier”) ebuild quiz to be submitted by the recruits to get their overlay commit access. But rather than making it an ebuild quiz and an end of mentoring quiz like they are today, let’s make them an ebuild quiz, dealing with the ebuild-related questions only and a “upstream code maintenance” quiz, dealing with problems like automagic dependencies, PIC code, and so on so forth. Wouldn’t that make more sense than the current situation?
I don’t feel too well, I guess the anger caused by the whole situation, coupled with lots of work to do (including accounting, as it’s that time of the year, for the first time in my case), and a personal emotional situation that went definitely haywire. I’m trying to write this while working on some other things, and eating, and so on so forth, so it’ll might not be too coherent in itself.
In yesterday’s post I pointed to a post by Ryan regarding testsuites, and the lack of consistent handling of testsuites when making changes. While it is true that there are a lot of ways for test failures to go undetected, I think there are some more subtle problems with a few of the testsuites I encountered in the tinderbox project.
One of these problems I already noted yesterday and it’s the lack of a testsuite from upstream. This involves all kind of projects, final user utilities, libraries (C, Ruby, Python, Perl), and daemons. For some of those, the problem is not as much as there is no testsuite, but rather that the testsuite doesn’t get released together with the code, for some reasons (most of which end up being that the testsuite outweighs the code itself many times), and that it’s not as easy to track down where the suite is. For Ruby packages, more than a few times we end up having to download the code from GitHub rather than using the gem, for instance (luckily, this is almost easy for us to do, but I’ll try not to digress further).
Some tests also depend on specific hardware or software components, and those are probably the ones that give the worst headaches to developers. For what concerns hardware, well, it’s tough luck, you either have the hardware or don’t (there is one more facet regarding the fact that you might have the access but you might not be able to access it but let’s not dig into that). The fun start when you have dependencies on some particular software component. This does not mean depending on libraries or tools, those are given and cannot be solved in any other way beside actually adding the dependencies, but rather depending on services and daemons being running.
Let’s take for instance the testsuite for dev-ruby/pg, that is the PostgreSQL bindings extension for Ruby. Since you have to test that the bindings work, you need to be able to access PostgreSQL; obviously you shouldn’t be running this against a production PostgreSQL server, as that might be quite nasty (think if the tests actually went to access or delete your data). For this reason, the 0.8 series of the package does not have any testsuite (bad!). This was solved in the new 0.9 series, as upstream added the support to launch a private, local copy of PostgreSQL to test with. This actually adds another problem but I’ll go back to that later on.
But if database server related problems are quite obvious (and thus why things like ActiveRecord only have tests running with SQLite3 that does not need any service running), there are worse situations. For instance, what about all the software communicating through DBus? The software assumes being able to talk with the system instance of D-Bus to work, but what if you’re going to test disruptive methods and there is a local, working, installed copy of the same software? In general you don’t want for tested software to interact with the software running on the system. On the other hand, there are a number of packages that fails their tests if DBus is not running, or in the case of sbcl if there is no syslog listening to /dev/log. These will also create quite a stir, as you might guess.
Now, earlier I said that the new support for launching a local instance of PostgreSQL in the pg 0.9 series creates one further problem; that problem is that it now adds one limitation on the enviornment: you have to be able to start PostgreSQL from the testsuite; what’s the problem with that? Well, to be able to run the PostgreSQL commands you need to drop privileges to non-root, so if you run the testsuite as root you’ll fail… and while Portage does allow to run tests as non-root, I’m afraid it’s still defaulting to root (FEATURES=userpriv is the one that controls the behaviour). And even if the default was changed, there are other tests that only work as root or even some, like @libarchive@’s, that run slightly different tests depending on which users you’re running them as. If you run them as root, they’ll ensure, for instance, that the preservation of users and permissions work; if you run them as non-root, that you cannot write as a different user or cannot restore the permissions.
You can probably start to see what the problem is with tests: they are not easy; getting them right is difficult, and most often than not, the upstream tests only work in particular environmental conditions that we cannot reproduce properly. And a failure in the testsuites is probably one of the most common showstopper for a stable request (this is important when the older version worked properly, while the new one fails, as regressions in stable have a huge marginal cost!).
In my previous post about the possibility of me leaving and why, a few people commented on the “staleness” of the stable (and unstable, to some extent) trees in Gentoo. Now, I won’t argue that there are no problem; I actually said so myself a few months ago. But I’d like to clarify a few points related to the process of marking packages as stable.
First of all, we have to try to discern about two different types of staleness: single-package staleness versus systematic staleness; the latter case is what we had regarding Perl, and it’s much more complex than most users think. It wasn’t just a matter of making Perl itself to work, it also involved making sure that the packages using Perl worked properly. This is also not as easy done as it’s said: while perl-cleaner can take care of re-install the packages that link to or extend Perl, it does not take care of Perl-written scripts; even looking at the reverse dependencies doesn’t suffice as we’re still omitting system-set dependencies (and guess what? Perl is in the system set!).
And even if we were able to track down all the packages using Perl, directly or indirectly, in the tree, and I could get all of them passed through the Tinderbox (I did), we wouldn’t be too sure about its absolute solidity because half those scripts don’t have testsuites (see also this post by Ryan on the subject of tests). On the whole, though, we caught a few important failures, and we could asses that the tree was mostly ready, and so, finally, Perl 5.10 entered the tree, yai! I don’t think the road to stable is going to take much more time, by the way; I’d be temped to try soonish to use it for xine’s bugzilla (at least mirrored on this server first of course).
A similar issue happens with Ruby 1.9. We’re taking our time to get it unleashed even in unstable; right now it’s tightly masked. Why? We’ve been struggling with lots of trouble, that came then down to the Ruby-NG eclasses and that is one very important piece of the puzzle for us, as it allows to properly support multiple Ruby implementations without breaking the dependency tree or the general solidity of the system. It’s not perfect yet, and needs polishing, but for instance a couple of days ago I committed changes to the eclasses that ensures that scripts installed for a single implementation won’t break when a different one is selected (partly covers the problem described here but not entirely; to cover that properly we’re going to take a bit more time, I’m afraid, as we’re going to revisit the whole idea of selected Ruby implementation). And if you think that Ruby 1.9 is ready for prime-time right now, there’s a reality check waiting for you. Yes I rant about Ruby, but I think I also have a positive stance as I have patches (and a lot of those are merged upstream and even released).
You might have noticed my use of the first-person-plural pronoun (“we”). I’m going most likely to take a break from active Gentoo work, and try to reduce it for a while to the areas I’m interested in, trying not to feel too pressured about it. For instance I unsubscribed from the Gemcutter feed that tells me when Ruby packages are released so I don’t feel the urge to bump them. On the other hand, I don’t think it’s feasible for me to leave Gentoo, at least for what concerns Ruby and other things I work on. Worse comes to worse, I’ll get a frontend machine and use Fedora or something on that, with Gentoo as the backend server. Obviously if nothing can be fixed regarding the issue I brought up, I’m not going to stick for long, but I hope I got enough people to think about the problem that it can be solved — in Utopia at least.
So I have just shown you two systemic staleness problems in Gentoo (one which is partly solved, one that is actually caused by an external lack of stability that we are trying to resolve at the roots). What about the single-package staleness? Well there are many examples of that and the problems can range between very wide areas. People forget to ask stuff to be marked stable; developers might not think anybody needs that stuff stable anyway, packages might require specific hardware to be marked stable but no developer with such hardware can do that (think about the EntropyKey software that I maintain(ed) in the tree: you cannot say whether it works or not without having an hardware key yourself; I don’t know of any other Gentoo developer having them, so what would happen if I left?), or they might have complicated testing procedures that are difficult to reproduce.
On these matters, the amount of people working on the stabling process is not a binding factor; throwing more people at the problem is not going to solve it any sooner (by the way for this last phrase of mine, I’ll most likely be posting something in the next few days, again to make some points on why did I reach the bad point of snapping). Not unless you throw the right people at the right problem. The problem here is not really the stabling part, it might actually take very little time, the problem is that we have to document things, such as the testing procedures. Sometimes we have thorough testsuites, most of the time we don’t (in the case of Ruby, even when we have, they can be… tricky). I tried something, some time ago but it didn’t turn out what I was hoping for, at the end I actually stopped working on finishing that one because my half-easter egg, half-free culture community collaboration (alliterations…) crashed down in flames as the source I wanted to use, Jamendo, couldn’t get his own facts straight.
I don’t want for this post to go too deeply into the technical problems of testing, as this is better discussed separately, and most people interested in the topic I’m writing about might not be interested in the technical details. Let’s just say that I have seen a huge improvement in tests in the past few months. And further kudos to two teams who I know are documenting post-build testing procedures to indicate arch teams what to look at when testing their packages: Java and Emacs teams.
Now comes what might disappoint a few users, those users who think and asserts that the solution to staleness is the reckless commit of half-broken ebuilds, like Samba. I’m going to argue that the opposite is true (and I’m again borrowing a line out of NewsQuiz… I might have been listening too much to that program; my actual post style yesterday was probably deeply influenced by the newly-restarted Real Time with Bill Maher instead, but I digress).
First of all we have to agree on one point: staying a lot behind upstream sucks. Sucks for users and sucks for upstream as well. As Joost, from Sabayon, said to me earlier today (I’m following The Other Diego’s philosophy that today starts when I wake up, and ends when I fall asleep), upstream will be bothered if users won’t be testing their recent versions at all, and would rather stick to old, known-broken, already-fixed versions. Having been (and still being) on both side of the fences, upstream and downstream I can tell you that the best feeling is that when you can actually have distributions always using your latest, greatest code. This is, though, not always that simple, or feasible at all, because of upstream’s own actions, but again this is a topic for a different day.
Back to our reckless commits we go. Let’s take the example of Samba, since that’s what a commenter named, and something that, I think, is showing best what the trouble “Developer B” consists of. One of his justification is that the current stable Samba is vulnerable; I’m afraid to tell you all guys that it might well be true. I use a conditional here, because I didn’t have the time, nor the will, to track down whether it’s actually true or just speculation — it is, though, true that our Security team, also understaffed, hadn’t had time to deal with all the lower-level security issues in a few weeks; I’m pretty sure they’ll catch up soon. Now, if the problem was security, we should be striving to get the new ebuilds stabled soon, shouldn’t we? And to do that, you should be working actively to reduce the amount of bugs in those ebuilds.
Neither seem to be happening; the stable tracking bug reports actually that x86 is waiting, and last I checked with them, they were actually tempted to go with 3.3 still. This is quite understandable, as 3.4 is now fully split, but unpolished and without any plan on how to migrate from monolithic to split – at the time X.Org went through the splitting up, Donnie planned up months ahead, now of course that consisted of over a hundred packages, maybe even a few hundreds, and this is a much smaller scale, but the very fact that “Developer B” when asked about a migration plan replied me that it was too boring to do should set the mood straight on the issue. Not that it’s going to matter anyway, 3.5, and maybe even 3.4, is going to be monolithic again. Yes you’re going to get blocker, removal and so on again on unstable, oh joy!
And these bugs are assigned to a team that does not include our mysterious “Developer B” as he didn’t add himself to the alias, as I said before. Is he CCed on any of those? Nope; okay this might be QA’s (my!) fault, as I should have noticed earlier that he wasn’t on the alias and either reported that to devrel, or added him forcefully. Now, as most of these issues are important you’d expect that he’d be working on finishing this task, rather than going off and, oh, go on bumping another subsystem that he doesn’t even use. But he won’t care; why? Because he admitted many times he does not even use Samba! Nor Mono! Again, try to wrap your mind around this concept: how can he be improving the situation for theirs users, not being one himself? Not feeling the pain, nor sharing the gain?
Any kind of reckless bump and non-trivial change in a subsystem will require a long time to deal with, and the more you tend to stray away from the upstream-sanctioned behaviour, the more you’re going to suffer when it’s time to follow their lead. When you have to make big changes you compromise. One of these compromise in Ruby land has been that of trying to get the latest non-ported ebuild stabled if an old stable was present, before moving fully toward Ruby-NG. It’s going to have some growing pains, and yes, you have to use fully unstable (for the Ruby ebuilds, not for the whole tree, of course!) for it to work for now, but we are usually quite conservative on making sure that it works as intended.
When you make big changes, and you don’t plan, nor compromise, on how to deal with them on the long run, you’re just going to suffer, or you might just end up with an even more stale stable tree than you started with. On the other hand, it might be much, much worse if the stable tree gets broken badly, because packages that haven’t been planned ahead are moved there to remove the staleness. And by the way, this does not mean that I’m not saddened by the fact that to use Gentoo properly on things like vserver, xen or lxc guests, you’re basically forced to use some unstable packages, as OpenRC is the only one that works, and Baselayout 1 is definitely rotting in tree. Unfortunately I’m also quite sure that there are packages that are not fixed for that yet.
Anyway, to cut this post short so I can also get some sleep and do more useful work tomorrow, I’d like to point out to the concept of marginal cost that I was introduced to by a splendid book by Richard Dawkins on evolution. The marginal cost of stabling something depends on many factors; one of them is the amount of changes since the previous revision (which is why major version bumps, or total changes in the system’s packaging, make it harder to stable it), another is regressions from the previous version (dropping patches that no longer apply, but are still not fixed upstream increases the marginal cost tenfold). Our perfect setup is to always have a very low marginal cost for stabling, and that means not changing the ebuilds in any drastic way unless strictly needed.
But if we take the example of the recurrent laryngeal nerve that Dawkins uses in his book as a proof of evolution, we can easily see that we’re not in a biological evolution scenario, we can make drastic changes when needed to solve a situation that is blatantly out place. In such cases, though, we’re going to increase our marginal cost for stabling… and have to accept a longer stable delay. And that will bring us to various possible ways to tackle that, which are too technical for most of the people reading this in the first place, and that I’ll discuss in the next days instead.
Yesterday I snapped and declared my intent to resign from Gentoo together with stopping the tinderbox and leaving the use of Gentoo either. Why did that happen? Well, it’s a huge mix of problems, all joined together by one common factor: no matter how much work I pour into getting Gentoo working like it should be, more problems are generated by sloppy work from at least one or two developers.
I’m not referring about the misunderstandings about QA rules, which happens and are naturally caused by the fact we’re humans and not being of pure logic (luckily! how boring it would be otherwise, to always behave in the most logical way!). Those can upset me but they are still after all no big deals. What I’m referring to is the situation where one or two developers can screw up the whole tree without anybody being (reasonably) able to do a thing about it. We’ve had to two (different) examples in the past few months, and while both have undeniably bothered QA, users, and developers alike, no action has been taken in any of these cases.
We thus have developer A, who decided that it’s a good idea to force all users to have Python 3 installed on their systems, because upstream released it (even when upstream consider it still experimental, something to toy with), and who kept on ignoring calls for dropping that from both users and developers (luckily, the arch teams are not mindless drones, and wouldn’t let this slide to stable as he intended in the first place). The same developer also hasn’t been able to properly address one slight problem with the new wrapper after months from the unleashing of that to the unstable users (unstable does not mean unusable).
Then we have developer B who feels like the tree’s saviour, the only person who can make Gentoo bleeding edge again… while most of if not all of the rest the developer pool is working on getting Gentoo more stable and more maintainable. So, among the things he went on doing, there was a poorly-performed Samba bump (suboptimal was the term he used — I ended up having to fix the init scripts myself because they weren’t stopping/restarting properly, as the ebuild and the init scripts went out of sync regarding paths), some strangely incomplete PostgreSQL changes, and a number of minor problems with the packages.
Of the two, I was first upset most by the former, but on the long run, the latter is the one who drove me mad. Let’s not dig too much on the stance about --as-needed (cosmetics — yeah because being able to return from a jpeg bump with less than 100 packages, rather than the whole world, is just cosmetics), and the fact that he’s ignored most of the QA issues with the packages he touched. Instead look at the behaviour with a package of mine (alas, I made the mistake of let this one slip with just a warning, I should have taken the chance to actually defer it to devrel…): vbindiff.
The package is something I added a while ago because from time to time it comes out useful. I’m in metadata.xml; I’m definitely not an unresponsive maintainer. Yet, while my last bump was on June 2008, the version in tree was not the latest one up to last September (2009). Why? A quick glance at the homepage shows that the beta4 release was mostly fixing a Win32 bug, and introducing a way to enable debug-mode. So what happens? Our mighty developer decides to go on and bump the package; without asking me; with nobody asking him; without a mail, a nod or anything. I literally notice this as emerge tries to upgrade a package I know I maintain. You’d expect for the debug support to be present in the ebuild then, and you’d find a debug USE flag if you checked now indeed, but that’s something I added myself afterwards, as the damage of pointlessly bumping something was already done.
Now, why did that happen? Well, he admitted he just went through the dev-* categories, without considering maintainers declared in metadata, and blindly bumped ebuilds when the latest version available on the site was higher than the one in tree. Case in point he had to open the vbindiff site and thus the release notes regarding Win32 and --enable-debug would have been clearly visible, if he cared to even read part of them. Whoever tried doing serious ebuild business should know that most of the time even the upstream-provided release notes are not something to go on by… Interestingly enough, his bleeding-edge hunger didn’t make him ask for a new stable, and we currently have a very old one.
So there we have your developer B, the super-hero, the last good hope of the bleeding edge, who bumps packages without consulting the guy who maintain them (and is around almost 24/7) and without even caring to use them at all. Why did I let it slip? Because I was most focused on trying to stop developer A at the time is probably the right answer. I did issue a reprimand reminding him to not touch someone else’s packages, and to learn using package.mask for things like Samba. I was hoping he would listen. Oh boy, was I ever so wrong.
Speaking a second again about Samba, did I tell you yet that the split into multiple packages was done, straight to ~arch, without any plan to follow-up to convert dependencies? Wonder why the whole thing is now stalemated again. Maybe the arch teams don’t see it all too well to have the same kind of dependency breakage in stable as there was/is on unstable right now.
First-hand information about our developer B wants him to be inlined with a zealot point of view regarding the Mono project — you’d then guess that dotnet stuff would be the last thing he’d be touching, but instead, without any questioning, ignoring the fact I stated at FOSDEM that I was going to look into that as soon as I had time, the fact that I stated before multiple times that I was already working on un-splitting the gtk-sharp packages, and the fact that I took contact with the Mono developers (again at FOSDEM) to try following upstream more closely. Oh and the one thing that pissed me off about that bump? Beside the fact that tomboy now refuses to work? Remember this patch? It was dropped; without even mailing me if I had or could make a version for the latest version. It was dropped in unstable (or, how it should be called if this kind of stuff is allowed to continue, unusable).
And the cherry on top? As I said, this developer touched Samba, PostgreSQL, now Mono… there are three aliases for these things (samba, pgsql-bugs and dotnet), who the bugs are assigned to… he’s on none of them! And before somebody tries to argue that, I’m pretty confident he’s not following the aliases on the Bugzilla (plus, given he also argued that the problem was with leaving security-vulnerable stuff in the tree – which by the way means having working, complete, safe ebuilds to be able to mark stable, and he doesn’t seem to be able to come up with any of those – the most important security bugs don’t get sent to watchers). How does he suppose to see the bugs coming? Oh but by wrangling the bug himself! Yeah, after all developers don’t file bugs themselves assigning them straight to the maintainers by procedure, do they? (fun fact: Bugzilla queries report at most 5K bugs, so that list is a very much limited result from what I was hoping to get); nor do other developers ever wrangle it would be silly, and there is no Arch Tester to speak of, right?
You can now see most of the pictures, and why I’m mostly upset with developer B. What made me snap yesterday were remarks that insisted that I was just “whining” and “not doing enough” as bugs kept piling up. What the heck? I constantly had over 1000 bugs (over 1300 today) for the past year or so, I know very well that bugs keep piling up! And I’ve been doing all I can do outside of my work hours (while I have to thank some people, including Paul, David, Simon, Andrew and Bela for their contributions, I’m not paid to do Gentoo work; and while I do get to use it, and thus contribute back to, for some of the jobs I take, it’s definitely not the same as working on Gentoo), including the whole RubyNG porting and improvement trying to make sure we can actually get to a point where unmasking Ruby 1.9 will not break any user whatsoever. Am I really doing too little? ”Not enough”?
Okay so the proper way to handle this, with the current procedures, would be to take this up to the Developers’ Relations so that they could act on it; QA can only ask infra to restrict commit access if we’re expecting a grave and dangerous breaking of the tree, or misuse of commit rights. So why didn’t I bring this up to devrel? Well, the main reason is that devrel nowadays, as far as I can tell, is exactly three people: Petteri, Denis and Jorge, and of the three the only one who’s for preventive suspension of commit rights is Denis (this has been proven with the case about developer A above); one out of three does not really sound much of a chance for this to improve the situation. And if – again as happened with developer A – DevRel then decided that the right action would be to issue a reprimand, that would amount to scolding the developer and asking to work more with others… well, it wouldn’t change a thing.
The whole QA system has to change! We’ve got to write down guidelines, rules, and laws, and be conservative in applying them. You shouldn’t go around breaching them and then appealing when QA finds you out of line, you should talk with QA if you feel the rule is misapplied to your case in any way.
So here you go, in a nutshell, why my preservation instinct right now is telling me to flee. I’m not sure yet if I’ll outright flee or just give it time for the situation is addressed and then decide. The reason is: I still like the Gentoo system, and since I rely on it for my work I cannot leave it alone; if I were to move to anything else I would have to spend (waste?) even more time to fix the same issues anyway, and I’d much rather get Gentoo working right. But I cannot do this alone, I cannot do this especially if I have support neither from developers nor users. So please voice your concern.
If you feel like Gentoo needs the better QA, if you feel like we shouldn’t be translating unstable to unusable, then please ask for it. I’m not saying that we should become stale like Debian stable, but if it takes a few months to get something straight, then it should take its time and not be forced through (that’s what the Ruby team has been doing all this time to work with Ruby 1.9 and Ruby EE and other implementations as well!). If you use Twitter, identi.ca, Digg, Reddit, Slashdot, whatever, get this post running. Maybe I’m subverting the process, but to quote BBC’s NewsQuiz, “Trial by media is the most efficient form of justice” (this was in reference to the British MP expenses scandal last year), and right now my only concern is effectiveness.
You might remember my quick review of the Secure Programming with Static Analysis book. While on the overall I was expecting a much more practical view on how to maximise the gain from static analysis (like how to make sure that trying to get rid of false positive does not end up cluttering both the source and the produced object code), it had some quite important insights that I think are worth the read, and the money of the book.
One of these insight is an explanation on why Microsoft’s “secure” interfaces differ from the standard POSIX ones. Having a status code returned, to check whether the action completed, failed, or completed-with-truncation, is definitely more useful than being returned one of the two pointers that are already provided as input. Similarly, the book shows some of the “secure wrappers” commonly used for replacing inherently insecure functions such as readlink().
Now, on the whole, this is all good, but I noticed one thing while following the libvirt development mailing lists: people end up reinventing tons of little wheels all around. While I like the idea behind gnulib, and I even wrote an article about its use a long time ago, it starts to show a couple of shortcomings in my view. The first is that the same source code has to be bundled to a number of projects; while it’s usually ignored for the most part on modern systems that have the functions available, it’s still source code that is shipped around multiple times and that might have nasty problems. The second problem is that both on modern systems (when wrappers are involved) and on less-modern systems (or systems that comply with older versions of the various standards, such as Solaris, or AIX) the same object code is added to multiple binaries, instead of shared among them, increasing both the on-disk and in-memory sizes. It also adds the burden of verification, and replacement, of interface to the single programs rather than centralising it in a project.
Why bother, given that then you might as well just port a subset of the GNU C library (or just use a ported uClibc), and at that point you might as well not use that operating system at all? Well one of the problems with the current approach is felt even by users running Linux, Gentoo users in primis as they feel the slowness of running ./configure and having to check for the same features every time (compare this old post of mine — the best way to make a configure script faster is to reduce the number of tests it has to perform!). Shouldn’t it be enough to assume that the interfaces are present, and leave it to the user to provide a replacement library if they are not?
This is after all the favourite approach of the FFmpeg project: if POSIX or C99 mandates the presence of an interface, then FFmpeg can use it; if it’s not available, it’s up to the user/developer/packager to provide the proper flags, include paths, extra libraries to have them available. Non-standard compiler features used are a different matter, of course.
But even if this would solve the problem by having some sort of libgnucompat or libposixcompliant library to deal with other operating systems it does not solve another problem that I’ve noticed applies to libvirt: reinventing wrappers, be them security-wrappers or not. Indeed if you look at the symbols exported by libvirt.so, you’ll easily see that there are sixteen functions with virFile prefix that seems to be just convenience and security wrappers around common file operations. This reduces the amount of boilerplate code that libvirt developers have to write each time they have to use that particular feature, but then you think that similar code is written by many other projects as well to deal with the same situation; this is where convenience libraries come into being, stuff like glib, for instance.
Unfortunately, since there’s more than one way to skin a cat, there is no drought of convenience libraries, even conflicting convenience libraries, out there. And nobody seems to agree on what’s the right way to do them (for instance, I can actually appreciate very well the hatred on glib’s use of g-prefixed basic types , such as guint8 and gpointer rather than keeping with the standard types that are available in C99 such as uint8_t. While these are not always available, it’d make much more sense to make those available rather than inventing your own, no? But let’s not keep on that topic for now.
Some of the most widely common wrappers are also getting slowly into the C libraries and the actual standards, although sometimes with not-too-bright results (the getline() function really could have used a nicer, less un-specific name), and other times with huge feuds between implementers (anybody has seen strl*() functions on POSIX yet? or glibc?).
With all the defects in it as well as the other autotools, libtool has probably done one of the best wrappers out there: libltdl. With all its possible problems (and there are many), that library is well designed enough to be usable in at least three widely different configurations — as described — including the ability to bundle a copy of the library but still use the system copy if so asked (or even by default). Too bad this does not seem to happen with any other kind of wrappers’ library.
This seems to be the opposite of the system when compared with the situation happening within the Ruby community; maybe because creating and publishing a gem is so easy (especially much easier compared to the standard track of release publishing for C-based libraries and packages — or any other language that is compiled, mostly), we have a huge number of “code fragments” gems, that provide one or two source files, with either a couple or classes or a handful of useful functions that are then reused on multiple packages by the same author. Not that the Ruby way here is perfect (but it surely is better than other Ruby ways I ranted on about before), and one of the biggest problems is that many time you have multiple gem solving the same problem once and again, like for testing systems.
I don’t hold much hope that developers can sit along and decide to write on a single implementation of anything, but it sure would be so nice if it happened. You’d then have the same code shared among all processes, with no duplication, with a lot of eyes to look at the possible faults and solving them, and so on so forth. Yes it’s definitely an utopian point of view. Alas.
I have to say that in the months we’ve been working on the new eclasses, I never went on describing properly how to use them. My hope was to write this documentation straight into the next-generation development manual for Gentoo, but since that project is far from coming, I’ll just rely on my blog for a little while more.
As described in my blog posts the idea behind the “new” (they are in tree for a few months already by now) eclasses is to be able to both handle “proper” Gentoo phases for packaging gems, and at the same time manage dependency and support tracking for multiple Ruby implementations (namely, Ruby 1.8, Ruby 1.9 and JRuby right now). How can we achieve this? Well, with two not-too-distinct operations; first of all we avoid using RubyGems as a package manager – we still use, in some cases, the gem format, and we always use the loader when it makes sense – and then we leverage the EAPI=2 USE-based dependencies.
Why should we not use RubyGems package management for our objective? With the old gems.eclass we used to encapsulate the install operation from RubyGems inside our ebuilds, but it was all done at once, directly into the install phase of the ebuild. We couldn’t have phases (and related triggers) such as prepare, compile, test and install. In particular we had no way to run tests for the packages at install time, which is one of the most useful features of Gentoo as a basis for solid systems. There are also other problems related to the way the packages are handled by RubyGems, including dependencies that we might want to ignore (like runtime dependencies injected by build-time tools), and others that are missing in the specification. All in all, Portage does the job better.
For what concerns the USE-based dependencies, when we merge a package for a set of implementations (one, two, three or any other number), we need its dependencies (at least, the non-optional ones) installed for the same set of implementations, otherwise it cannot work (this is a rehashing of the same-ABI, any-ABI dependencies problem I wrote about one and a half years ago). To solve this problem, our solution is to transforms the implementation into USE flags (actually, they are RUBY_TARGETS flags, but we handle them exactly like USE flags thanks to USE_EXPAND), at that point, when one is enabled for a package, the dependencies need to have the same flag enabled (we don’t care if a dependency has a flag enabled that is not enabled in the first package, though).
This actually creates a bit of a problem though, as you end up having two sets of dependencies: those that are used through Ruby itself (same-ABI dependencies) and those that are not (any-ABI dependencies), such as the C libraries that are being wrapped around, the tools used at runtime by system calls, and so on so forth. To handle this, we ended up adding extra functions that handle the dependencies: ruby_add_bdepend and ruby_add_rdepend, both of which “split the atoms” (yeah this phrase sounds nerdy enough), appending the USE-based dependencies to each. They also have a second interface, in which the first parameter is now a space-separated (quoted) list of USE flags the dependency is conditional to.
This is not the only deviation from the standard syntax that ruby-ng.eclass causes: the other is definitely more substantial: instead of using the standard src_(unpack|prepare|compile|test|install) functions, we have two sets of new functions to define: each_ruby_$phase and all_ruby_$phase. This ties into the idea of supporting multiple implementations, as there are actions that you want to take in almost the same way for all the supported implementations (such as calling up the tests), and others that you want to execute just once (for instance generating, and installing, the documentation). So you get one each and one for all function for each phase.
There are more subtle dependencies of course; in the call to the each type of functions you get ${RUBY} to be the command to call the current implementation, while in the all functions it’s set to the first-available implementation (this is important as we might not support the default implementation of the system). The end result is that you cannot call neither scripts, nor commands, directly; you should, instead, use the ${RUBY} -S ${command} format (for the commands in the search path, like rake, at least), so that the correct implementation gets called.
Oh and of course you cannot share the working directory between multiple implementations, most of the time, especially for the compiled extensions (those written in C). To solve this problem, at the end of the prepare phase, we create an implementation-private copy of the source directory, and we use that in the various each functions; to be on the safe side, we also keep a different source directory for the all functions, so that the results from one build won’t cause problems in the others. To avoid hitting performance too much here, we actually do exactly two tricks: the first is to use hardlinks when copying the source directories (this way, the actual content of the files is shared among the directories, and only the inodes and metadata is duplicated); the second is to invert the order of the all/each calls on the prepare phase.
While in all other cases all is executed after the implementation-specific functions, the all phase is executed before the other prepare functions… which are preceded by the copying, of course. This means that the changes applied during the all_ruby_prepare function are done over the single generic directory and then is copied (hardlinked) to the others.
So this covers most of the functionality of the ruby-ng.eclass, but we had another tightly-related eclass added at the same time: ruby-fakegem.eclass. Like the name let you guess, this is the core of our ditching RubyGems as a package manager entirely. Not only it gives us support for unpacking the (newer) .gem files, but it also provides default actions to deal with testing, documentation and installation; and of course, it provides the basic tools to create fake RubyGems specifications, as well as wrapping of gem-provided binaries. An interesting note here: all the modern .gem files are non-compressed tarballs, that include a compressed metadata YAML file, and a compressed tarball with the actual source files; in the past, there has been a few gems that used instead a base64/mime encoding for sticking the two component files together. For ease of maintaining it, and for sanity, we’ve decided to only support the tarball format; the older gems can be either fixed, worked around or replaced.
The boilerplate code for ruby-fakegem assumes that most gems will have their documentation generation, and tests, handled through means of rake; this is indeed the most common situation, even though it’s definitely not the same situation among different projects. As I said before, Ruby’s motto is definitely “there are many ways to skin a cat”, and there are so many different testing frameworks, with different task names, that it’s not possible to have the same exact code to work for all the gems unless you actually parametrise it. The same goes for the documentation building, even when the framework is almost always the same (RDoc; although there are quite a few packages using YARD nowadays, and a few that are using Hanna — which we don’t have in tree, nor will support, as it requires a specific version of the RDoc gem. an older one). The result is that we have two variables to deal with that: RUBY_FAKEGEM_TASK_TEST and RUBY_FAKEGEM_TASK_DOC which you can set in the ebuild (before inheriting the eclass) to call the correct task.
Now, admittedly this goes a bit beyond the normal ebuild syntax, but we found it much easier to deal with common parameters through variables set before the inherit step, rather than having to write the same boilerplate code over and over… or have to deduce get it directly from the source code (which would have definitely wasted much more time). Together with the two variables above we have two more to handle documentation: RUBY_FAKEGEM_DOCDIR that is used to tell the eclass where the generated documentation is placed, so that it can be properly installed by the ebuild, and RUBY_FAKEGEM_EXTRADOC that provides a quick way to install “Read Me”, ”Change logs” and similar standalone documentation files.
Finally, there are two more variables that are used to handle more installation details. RUBY_FAKEGEM_EXTRAINSTALL is used to install particular files or directories from the sources to the system; this is useful when you have things like Rails or Rudy wanting to use some of the example or template files they are shipped with, at runtime; they are simply installed in the tree like they were part of the gem itself. RUBY_FAKEGEM_BINWRAP is the sole glob-expanded variable in the eclass, and tells it to call the “binary wrapper” (not really binary, but rather scripts wrapper; the name is due to the fact that it refers to the bin/ directory) for the given files, defaulting to all the files in the bin/ directory of the gem; it’s here to be tweaked because in some cases, like most of the Rudy dependencies, the files in the bin/ directory are not really scripts that are useful to be installed, but rather examples and other things that we don’t want to push in the system’s paths. It also comes useful when you might want to rename the default scripts for whatever reason (like, they are actually slotted).
What I have written here is obviously only part of the process that goes into making ebuilds for the new eclasses, but should give enough details for now for other interested parties to start working on them, or porting them even. Just one note before I leave you to re-read this long and boring post: for a lot of packages, the gem does not provide documentation, or a way to generate it, or tests, or part of the datafiles needed for tests to run. In those cases you really need to use a tarball, which might come out of GitHub directly, if the repository is tagged, or might require you toy with commit IDs to find the correct commit. Yup, it’s that fun!
I have ranted and ranted and ranted about Ruby packages not being good enough for packaging, I also have ranted about upstream developers not even getting their own testsuite cleared up, or being difficult to work with. I have complained about GitHub because of the way it allows to “fork” packages too easily. I’m not going to retract those notes, but… sometimes things do turn out pretty well.
In the past days I’ve been working toward adding Rudy in tree — as I don’t want to keep on building my own slow, hard, and boring scripts to deal with EC2, and I’m spending more time understanding how to get EC2 working than writing the code I’m paid to write. As I wrote before, this is another of those compound projects that is split in a high number of small projects (some literally one source files per gem!). It worried me to begin with, but on the other hand, the result is altogether not bad.
Not only the first fixes I had to apply to amazon-ec2 were applied, and a new version released, the very night I sent them upstream (and added them to Gentoo, now gone already), but also Delano (author of Rudy – and thus of lots of its dependencies) applied quickly most of my changes to get rid of the mandatory requirement for hanna, even on some packages I didn’t send them for yet, and released them again. Of course the job is far from finished, as I haven’t reached Rudy itself yet, but the outcome start to look much nicer.
I also have good words for GitHub right now: since it makes it very easy and quick to take the code from another project, patch it up and send it to the original author to be merged (and re-released hopefully). This also works fine with patches coming from other contributors, like Thomas Enebo from JRuby who sent me a fix (or “workaround” if you prefer, but it’s still a way to achieve the wanted result in a compatible way) to make newer matchy work properly with JRuby. On the whole, I have to say I’m getting quite positive about GitHub, but I’d very much like they allowed me to reply to the messages I receive by mail, rather than having to log-in on the system. I positively hate multiple mail systems, Facebook’s as well as GitHub’s, as well as most forums’.
And for the shameless plug and trivia time, I have more repositories in my GitHub page than items in my wishlist …
Anyway, back to work now!
You might have noticed that I started moving (renaming) Ruby packages, both with old and new ebuilds, to drop the ruby- prefix from packages such as ruby-mmap, ruby-bz2 and ruby-fcgi. My reason to proceed in this way is not only to avoid duplication, but to have an extra safety that the name of the ebuild is going to correspond (minus casing, most of the time) with the gem’s name. This gets even more important since the fakegem eclass will default to use the ${PN} variable to do its magic tricks for fakegem handling.
But the tipping point for this set of changes (which aren’t really easy, nor transparent) has been the dev-ruby/ruby-fcgi package we used to ship. Even Alex assumed that the gem it referred to was ruby-fcgi that exists, but in truth it has always simply been fcgi which is a different gem. To avoid these possible collisions in the future, the rule is going to be “use the original naming, don’t add prefixes!”. Obviously there are and there will be exceptions, such as the jruby-debug-base package that installs a gem named ruby-debug-base (this is to sidestep the dependency tracking in the original ruby-debug to let it load the JRuby specific code instead).
This post is not, though, about the naming scheme, but rather should give you an idea why we still haven’t unleashed Ruby 1.9 not even as a secondary Ruby implementation (while JRuby is). You can easily read around the net that “@$package@ now supports Ruby 1.9”… sometimes it’s true, sometimes it definitely is not. For instance when they say that Rails 2.3.5 supports Ruby 1.9 officially, they fail to tell you that builder (which is bundled by activesupport — and slightly patched, but we’re going to count the patched version anyway) looks pretty broken on Ruby 1.9, assuming its testsuite works as intended, which is what appears to me. And the rest of the code does not seem to be much better: tmail fails its tests, among many.
In the case of fcgi (which used to be mandatory dependency of rails 2.3.5 in Gentoo, although I’ve dropped it in the ruby-ng port, as it’s not really that needed, and the gem itself does not depend on it), the original code (version 0.8.7) does not work on Ruby 1.9. And we knew that, Alex added a 1.9 compatibility patch in Gentoo before: it built, and we shipped it, but… was it tested? Nope, since the old eclasses had no support for testing. Actually, I hit this problem when, a few months ago, I added a further safety check for Ruby 1.9: the linked extensions are built with --no-undefined so that eventual undefined symbols won’t cause Ruby to abort at runtime (which happened to me before). Indeed, even though the extension “compiled”, it left the undefined symbols, so it could never be loaded properly at runtime, because the function it used are Ruby 1.8-only and not defined in 1.9 at all. At the end – both on the in-tree ebuild, and in the testing overlay for the new eclasses – I ended up disabling the native extension for anything that is not Ruby 1.8 (there is a pure ruby, slower implementation that works even on Ruby 1.9 and JRuby).
But that’s for the “old” fcgi gem, let’s look at the description from the new one (emphasis mine; grammar quoted):
FastCGI is a language independent, scalable, open extension to CGI that provides high performance without the limitations of server specific APIs. For more information, see http://www.fastcgi.com/. This is the fork of fcgi implementation for ruby but with ruby1.9 – ruby1.9.1 compability
So this should actually have the extension working correctly with Ruby 1.9, you’d say. After all, the previous pure Ruby extension worked fine already, nothing to do there. Okay, so let’s build it (after fixing a three years old bug):
make -j12 -s fcgi.c: In function ‘fcgi_stream_puts_ary’: fcgi.c:285: warning: implicit declaration of function ‘rb_inspecting_p’ [-Wimplicit-function-declaration] fcgi.c: In function ‘fcgi_stream_puts’: fcgi.c:309: warning: implicit declaration of function ‘rb_protect_inspect’ [-Wimplicit-function-declaration] fcgi.o: In function `fcgi_stream_puts': /var/tmp/portage/dev-ruby/ruby-fcgi-0.8.9/work/ruby19/ruby-fcgi-0.8.9/ext/fcgi/fcgi.c:309: undefined reference to `rb_protect_inspect' fcgi.o: In function `fcgi_stream_puts_ary': /var/tmp/portage/dev-ruby/ruby-fcgi-0.8.9/work/ruby19/ruby-fcgi-0.8.9/ext/fcgi/fcgi.c:285: undefined reference to `rb_inspecting_p' /var/tmp/portage/dev-ruby/ruby-fcgi-0.8.9/work/ruby19/ruby-fcgi-0.8.9/ext/fcgi/fcgi.c:285: undefined reference to `rb_inspecting_p' collect2: ld returned 1 exit status make: *** [fcgi.so] Error 1
Guess what? that’s the same problem that the old fcgi had with Alex’s patch. It only fails at build time with the Gentoo version of Ruby 1.9 as we’re forcing --no-undefined, on other Ruby 1.9 packaging, you’ll get this to build… and then kill your Ruby process at runtime. So no, this gem is definitely not compatible with Ruby 1.9 even though it is stated so.
Now, there is another fork of the classic fcgi gem, with version 0.8.8, what does change in that? Well the first problem is that the content of the gem has not changed the version at all:
flame@yamato fcgi % egrep '0\.8\.[78]' . -r ./README:Version 0.8.7
Is this enough to mutter “boooooring”? Maybe, but to be fair, let’s try building for Ruby 1.9:
flame@yamato fcgi % ruby19 extconf.rb checking for fcgiapp.h... yes checking for FCGX_Accept() in -lfcgi... yes creating Makefile flame@yamato fcgi % make x86_64-pc-linux-gnu-gcc -I. -I/usr/include/ruby19-1.9.1/x86_64-linux -I/usr/include/ruby19-1.9.1/ruby/backward -I/usr/include/ruby19-1.9.1 -I. -DHAVE_FCGIAPP_H -fPIC -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -floop-block -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -Wformat=2 -Wno-error -Wno-pointer-sign -fdiagnostics-show-option -fno-strict-aliasing -O2 -g -Wall -Wno-parentheses -fPIC -o fcgi.o -c fcgi.c fcgi.c: In function ‘fcgi_stream_puts_ary’: fcgi.c:276: warning: implicit declaration of function ‘rb_inspecting_p’ [-Wimplicit-function-declaration] fcgi.c: In function ‘fcgi_stream_puts’: fcgi.c:300: warning: implicit declaration of function ‘rb_protect_inspect’ [-Wimplicit-function-declaration] x86_64-pc-linux-gnu-gcc -shared -o fcgi.so fcgi.o -L. -L/usr/lib64 -Wl,-R/usr/lib64 -L. -Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu -Wl,--sort-common -rdynamic -Wl,-export-dynamic -Wl,--no-undefined -Wl,-R -Wl,/usr/lib64 -L/usr/lib64 -lruby19 -lfcgi -lpthread -lrt -ldl -lcrypt -lm -lc fcgi.o: In function `fcgi_stream_puts': /home/flame/mytmpfs/fcgi/ext/fcgi/fcgi.c:300: undefined reference to `rb_protect_inspect' fcgi.o: In function `fcgi_stream_puts_ary': /home/flame/mytmpfs/fcgi/ext/fcgi/fcgi.c:276: undefined reference to `rb_inspecting_p' /home/flame/mytmpfs/fcgi/ext/fcgi/fcgi.c:276: undefined reference to `rb_inspecting_p' collect2: ld returned 1 exit status make: *** [fcgi.so] Error 1
Okay so we’re back to the same problem, even this version which is supposed to be fixed to work with Ruby 1.9… is simply not, it’s a bundling together of the old code and some patch, the same patch that Alex used, maybe even picked up from Gentoo in the first place. Now, I can’t just reduce from this that all the compatibility with 1.9 is done this way, but it sure should tell you a lot about what “Ruby 1.9-compatible” might mean.
Sigh!