Kernel Planet

July 04, 2009

Kernel Podcast: 2009/07/02 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090702.mp3

For Thursday, July 2nd 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: cgroups, kmemleak, OOM, VFAT, and the return of the Zero Page.

Cgroups. Paul Menage posted an RFC patch series intended to add Hierarchy Extensions to Cgroups. With the patch series applied, one gets named cgroup hierarchies, cgroup hieracrchies with no bound subsystems, and cgroup subsystems that be bound to multiple hierarchies. A number of example use cases are included in the patch series which contains a total of 9 patches.

Kmemleak. In an effort to reduce false positive warnings, Catalin Marinas posted a patch that would better handle objects allocated during a kmemleak scan. With the patch applied, kmemleak will check to see if an allocation happened after it began scanning a list. If so, it will re-scan the list again and repeat the allocation test before reporting any problems. If the system is simply too busy to never scan the list without it changing kmemleak gives up after a certain number of passes (25 in the posted patch). With these patches applied (and perhaps some others also – it was not specified which), Catalin is finding a lot of reports of iwlwifi leaking memory and is not sure whether these reports are still noise, or a legimate bug that needs some attention. Catalin adds “I’m not familiar with this code so any help is greatly appreciated”. Perhaps someone will help take a look at that driver.

OOM. Minchan Kim followed up to the ongoing debate about why exactly a specific patch (intended to affect only non-swap machines) had caused so many OOM-type of situation, with a theory that the patch actually improved performance of page reclaim to the point where the specific tests being used would subsequently expose the system to a fork bomb. Minchan contended that David Howells had merely been “lucky” in his previous use of an unnecessary routine and informed us that Rik van Riel is currently working on a throttling version of page reclaim which should help.

VFAT. Discussion continues about the VFAT implementation. James Bottomley and Alan Cox had a debate concerning the ways in which vendors do and do not carry patches out-of-tree. Alan’s point was largely that vendors always carry patches regardless of the wishes of the kernel community, so one more isn’t a big deal, whereas James counter-argued that this went against the general notion that the kernel community was against long-lived out-of-tree bits. James defended the involvement of the Linux Foundation against what he described as “conspiracy theories” of anything more sinister going on. Later, Jan Engelhardt, Ted T’so, and Andrew Tridgell had a dialog concerning a number of devices Jan had found that broken when given filesystems modified using the various VFAT patches currently going around.

Zero Page (again). Kamezawa Hiroyu posted concerning the removal (in 2.6.24, back in October of 2007) of zero page support (essentially allowing things like sparse unbacked array allocations in userspace that are otherwise contiguous), noting that many customers and users of his haven’t really noticed that it was removed because they’re still using kernels like the 2.6.18 kernel in Red Hat Enterprise Linux 5. Recently, he has also seen an uptick in intentional users of zero page (to which Avi Kivity later added KVM in terms of its migration process) and so suggests a re-implementation that fixes many of the reference-counting and other problems from the old one.

In today’s miscellaneous items: Alek Du posted version 2 of his GPIO driver for Intel Moorestown, Catalin Marinas experienced a problem fixing an issue with kmemleak in which piped reads from the kmemleak debugfs interface would result in a lock being held on return to userspace, fread later, but obviously generating a warning from the lockdep. Gregory Haskins posted version 9 of his irqfd patch series, including fixes and rebasing on a more recent KVM. Justin P. Mattock posted a bunch of SELinux updates (some labeled “non-trivial”, some simply typos and things of that nature), Vivek Goyal posted version 6 of his IO Scheduler based IO Controller patches (including mostly split out patches and some fixes), Doug Graham posted (with a Nortel address), a V3 Minix filesystem fix for big endian systems (or those with >64K inodes) – turns out someone does still use Minix – and several other generic fixes, Venkatesh Pallipadi posted some fixes to Dave Jones for cpufreq lockdep warnings, James Bottomley posted a small number of SCSI fixes against 2.6.31-rc1, Jonathan Cameron posted version 4 of his Industrial I/O Subsystem patches (which seem to include yet another ring buffer implementation? I didn’t check yet), Alan Cox followed up to Lennart Poettering’s previous VT_WAITACTIVE patch with a more generic event interface for Virtual Terminals, and Chris Mason posted a series of btrfs updates (mostly small bug fixes, but also a first step toward snapshot deletion from Yan Zheng), intended to still make it into 2.6.31-rc2.

Finally today. Kumar Gala posted asking Alan Cox if he had some good examples of users of the tty later to use as a reference in bringing an out-of-tree serial driver for the Avocent ESP-16 MI Serial Hubs (serial over Ethernet) up to scratch for mainline inclusion.

In today’s announcements: Karel Zak posted RC2 of util-linux-ng v2.16. It includes the moving of the libuuid library from e2fsprogs into util-linux-ng. Kay Sievers noted that it fails to build in a clean chroot due to an install hook hack that moves some files around during the build. And Jaswinder Singh Rajput helpfully mailed a number of people concerning various feature removal dates that were previously committed to and have lapsed or are pending. At least one of these reminders has resulted in the related feature being killed by a subsequent patch, and will hopefully lead others to contemplate likewise.

Ryo Tsuruta announced the IO Controller Mini-Summit in Japan in October, which will immediately precede the Linux Kernel Summit. This is an event that was hinted at previously, although details remain “sketchy” and it’s not entirely sure who will be there at this time.

The latest kernel release is 2.6.31-rc1, which was released by Linus last week.

Andrew Morton posted an mm-of-the-moment for 2009-07-02-19-57. Meanwhile, various users of the previous mm-of-the-moment have reported a few glitches.

Greg Kroah-Hartman announced releases 2.6.27.26, 2.6.29.6, and 2.6.30.1 of the kernel. He strongly encourages users to upgrade, and notes that the .29 update will be the last so users should migrate to 2.6.30 as soon as possible.

Stephen Rothwell posted a linux-next tree for July 2nd. Since Wednesday, the “sfi” tree is still dropped, the tree still hates powerpc allyesconfig, and several conflicts went away also. The total sub-tree count in the latest compose stands at a respectible 132 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

July 04, 2009 01:38 AM

July 03, 2009

Kernel Podcast: 2009/07/01 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090701.mp3

For Canada Day (Wednesday, July 1st 2009), I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Encrypting the page cache, VFAT, and XFS.

Encrypting the page cache. There is an RFC floating around from Jeremy Maitin-Shepard for TuxOnIce (the out-of-tree alternative suspend code) concerning encrypting the page cache in RAM on suspend-to-RAM so that the system cannot be cold booted into another kernel and its memory content analyzed. Such an attack is not too far fetched – research over the past few years has shown that it is not only feasible but is also actively being used. The only real concern surrounding this seems to be the overhead involved, though it is likely to be a configurable option, if Jeremy follows up to the RFC with some patches.

VFAT. Ongoing discussion of the VFAT patch proposed by Andrew Tridgell seems to have headed toward changes to VFAT support possibly necessitating that the additional configuration options be part of a new (perhaps aptly named) alternative VFAT filesystem. Posters have pointed out that some windows systems choke when working with these patches and we probably don’t want those systems to bluescreen when reading from disks written to using Linux systems. As an aside, this author is always amused by the configurable nature of the Windows bluescreen – I recall a much younger version of myself in college winding up Microsoft support one day by inquiring about the registry codes to set the bluescreen background colour. Childish for sure, but I recall it being quite fun at the time – there are sixteen glorious color options!

XFS. Christoph Hellwig posted an updated about XFS support, in which he noted that the 2.6.30 kernel had incorporated fixes for ENOSPC handling, and had additionally shrunk by 500 lines in the latest release. He notes that in the current merge window, quotaops went away (which simplifies quotas), and XFS dropped its own POSIX ACL implementation in favor of the generic in-kernel one. This author recently noticed Andreas updated his acl git tree with a note that the official location for those bits is now Savannaih, as Christoph mentions in his summary also. Seems like an exciting time for XFS again.

Miscellaneous fixes include: fixes to round_up/down from H Peter Anvin, some minor Super-H updates from Paul Mundt, some race fixes for irqfd/eventfd from Gregory Haskins, some fixes for FUSE (Miklos Szeredi) that include a couple of minor features which might not be allowed in rc2, cache events in perf from Jaswinder Singh Rajput (kudos to you, jaswinder, for your continued efforts to make cleanups and fixes to the kernel – it doesn’t all go unnoticed either), a per-GPIO sysfs symlinking naming patch from Jani Nikula, a scheduling while atomic bug introduced by kmemleak is fixed by a patch from Ingo Molnar, a disabling of CLONE_PARENT for the init task from Sukadev Bhattiprolu, a large number of networking bits from David Miller (including an important SCTP fix), some md updates from Neil Brown (implementing mostly the new ‘topology’ numbers), version 6 of the Intel Trusted Execution (Boot) Technology covered several times previously in this podcast, and some block bits for 2.6.31-rc2 from Jens Axboe (incuding a removal of __GFP_NOFAIL misuse in CFQ that was debated a little over last week). Steven Rostedt noticed that the gcov patches don’t protect against modification to vsyscall memory (as his ftrace patches do) so fail to boot on his test system, falling over inside the initrd init.

This podcast is now two months old. You’ll notice we’re a little behind again this week (it won’t get any easier next week as your author is in Japan only from Monday to Friday and will be fighting severe jetlag over the next two weekends in addition to being on planes for 32 hours and trying to write a chapter on the plane). It’s tricky to do a podcast every day, but I enjoy it, so I plan instead to continue to make a best effort – and occasionally live with being a few days out of sync. If you would like to see a guaranteed daily service, feel free to volunteer to help do that. Meanwhile, I’ll keep the podcasts coming, you just keep listening and sending me those nice mails. And if you happen to be in Tokyo on an evening next week, let’s have some coffee!

In today’s announcements: LTTng 0.142. Mathieu Desnoyers announced version 0.142 of LTTng. This includes a fix to text poke he posted about previously. Also, the Linux Test Project has been released for June 2009. Subrata Modak posted to let us know that 6 testcases have been added covering 5 new system calls, major syscall and powermanagement test fixes have been added, and a number of additional test fixes have been added. The healthy number of contributors to June 2009 LTP include the ever-involved Darren Hart and a large number of others also. And this is certainly a worthy project to help out on if you’ve got some spare cycles these days.

The latest kernel release is 2.6.31-rc1, which was released by Linus last week.

Greg Kroah-Hartman is preparing new .27, .29, and .30 stable releases for which he sent out 30, 35 and a whopping 108 proposed patches respectively for review on Tuesday evening ahead of a deadline on Friday morning.

Stephen Rothwell posted a linux-next tree for July 1st. He dropped the new “sfi” tree due to build problems, powerpc still fails allyesconfig, and the tree overall gained conflicts over the previous day. The total sub-tree count is now up to 132 trees in the current linux-next compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

July 03, 2009 09:06 PM

Valerie Aurora: Soft updates explained

My latest article for LWN explains (!) soft updates. The "(!)" is because soft updates are notoriously difficult to understand. If you go to a file systems conference and get people drunk, they will eventually confide to you that they don't really understand soft updates either.

Soft updates, hard problems

This is a free link; if you like the article, please consider subscribing to LWN. You'll still need an account if you want to make snide comments on the article. :)

July 03, 2009 04:34 PM

Evgeniy Polyakov: Say NO to SQL database storages?

The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive.

Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.

New data storages getting more and more attention especially from new projects which manage huge amounts of data. Conventional projects still use SQL and RDBMS though, and while it requires fair amount of overhead, it does not force rethinking and redesigning of the common data storage approach, since it is rather simple to work with.

But things are changing.

Article is available on ComputerWorld.

July 03, 2009 02:14 PM

Harald Welte: Wireshark packet dissector for GSM 12.21 (A-bis OML)

During the last weeks I've been spending some time to start a wireshark dissector plugin for GSM 12.21, which is the Organization and Maintenance protocol between BSC and BTS. Using this protocol, many aspects of a BTS are configured by the BSC.

I have already implemented the BSC side of 12.21 inside OpenBSC, and OpenBSC contains parsing code and debug logs about what is happening on this protocol. However, I think it is much better to remove most of that debug printing code from OpenBSC and move it into wireshark. Whoever needs per-message debugging, can start wireshark and look at the output - with the advantage of extensive filtering capabilities.

The protocol is quite complex and has many different messages with each their own set of attributes. So the current work is far from being complete, but it's already at a point where it is really useful.

I've put a specific focus on implementing the vendor-specific bits for ip.access, since those are hard to figure out and much more difficult to implement for anyone who hasn't spent as many weeks looking at hexdumps from their Abis-IP protocol as me. Parsing standard 12.21 messages is easy, just read the publicly-available spec and add wireshark code for it.

In case you're interested, the plugin is available from this path in the OpenBSC git tree

July 03, 2009 02:00 AM

July 02, 2009

Pavel Machek: According to yr.no, world is going to end tommorow

...at 14:00 Prague time.

.

([Un]fortunately, prediction was already updated).

Oh and BTW... weather predictions seem to be quite a way off in the last few days. Always predicting rain... and then there's a sunny day with storm in a distance. (Ok, yesterday we got storm very close, and we did not make it to the stables fast enough -- could not gallop with all the children -- so we were totally wet, but....) I guess storms are hard to predict?

July 02, 2009 10:23 PM

Kernel Podcast: 2009/06/30 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090630.mp3

For Tuesday, June 30th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: fanotify, GPL, KVM, Modules, OOM, Real Time, Tasks, VFAT, VFS, and Virtual Terminals.

fanotify. Something we missed on Monday. Eric Paris posted a new version of fanotify, a notification mechanism originally designed to aid “anti-malware” vendors. The patch adds two key things, the ability to receive a read only fd pointing to modified filesystem objects (so they can be scanned for malware), and an access system in which processes may be blocked until an fanotify userspace listener has decided if whatever they were trying to do should be allowed (once they have been deemed “clean”). Eric reminds us that this is not an LSM, is not intended to provide system security, is not intended to prevent malware from running on Linux, but is merely intended to support on-access file scanning operations. Valdis Kletnieks followed up to say that he doesn’t care about virus scanners but that this could be useful for HSM applications.

GPL. Andrey Volkov posted an email about the ASUSTek Computer WMVN25E2+ WiMAX Subscriber Station, in which he alleges that this product is ingringing upon the GPL because it allegedy includes GPL software (he lists the versions) for which no source code is made available, and no offer of source is made under some kind of “intellectual property” defense. One hopes that that Andrey has tried other avenues of communication before emailing the LKML about it, since many companies can come around to doing the right thing with private prodding.

KVM. Gleb Natapov posted to let us know that KVM would like to provide the x2APIC interface to a guest without emulating interrupt remapping. KVM prefers this because x2APIC is better virtualizable and provides better performance than the MMIO xAPIC interface (Gleb cites examples of why this is the case). The patch changes x2APIC enabling so that it is enabled on KVM guests, even if interrupt remapping initialization failed.

Modules. Jan Beulich posted a patch reducing exported symbol CRC table size on 64-bit architectures. He does this by ensuring that these quantities are actually only stored as the 32-bit quantities they are (using assembly wrappers) rather than the 64 bits that are used when gcc is left to its own devices. By applying this patch, one saves 16k of kernel resident size, 2k module resident size, and a whopping 1M of vmlinux image size. On an unrelated note Jan also posted a patch replacing uses of num_physpages by totalram_pages since many memory sizing calculations should be influenced only by usable memory, not just the total number of physical pages (perhaps including lots of non-RAM).

OOM. Ongoing discussion of a patch intended for swapless systems that incorrectly also affected those with swap and caused a lot of OOM situations in 2.6.30 kernels onward (especially for David Howells, who found it) lead to the suggestion from Mel Gorman that OOM situations would also cause the kernel to print out the full active_anon LRU list – so that developers can figure out what pages are still on the active_anon list in that case, and – more importantly – which of those should not be there.

Real Time. Zoltan Bus posted a message saying that wake_up() sometimes isn’t waking up real-time priority tasks when called from an interrupt handler. This is interesting timing, because this author also heard just yesterday that (on the RT kernel) he should also never be calling wake_up from interrupt thread context. The correct places to use wake_up are probably worth documenting.

Tasks. Oleg Nesterov posted an RFC patch entitled “do not place sub-threads on task_struct->children list”. Currently, Linux systems add sub-threads to the ->real_parent->childen list, but this only really serves to slow down do_wait. With this patch, ->children contains only the main threads (group leaders). Roland McGrath thought this seemed mostly like the right idea.

VFAT. Ongoing discussion of Andrew Trigell’s latest patch. At the same time Hirofumo Ogawa followed up to ask whether it isn’t about time to change the default shortname=lower mount option for vfat filesystems. This is known to cause problems when copying files from one filesystem to another on Linux and affecting the originally intended case of the files, it also is inconsistent with respect to Windows behavior, whereas (as Jamie Lokier also agreed), shortname=mixed is a more sensible default.

VFS. In an evolution of previous discussion, Miklos Szeredi posted a new patch implementing a new O_NODE flag for open calls. Opening a file in such a way will not call the driver’s ->open() method and will not have any side effect other than referencing the dentry/vfsmount from the struct file pointer. This can be layered with other options to implement some useful features.

VTs. Lennart Poettering posted a patch implementing an extension to VT_WAITACTIVE such that is possible to wait until a specific VT becomes inactive. This will allow ConsoleKit to more easily keep track of which VT is the active one. Currently, ConsoleKit (which is used by distributions such as Fedora to assist with Fast User Switching of Graphical Desktops) creates 64 separate userspace threads, one for each theoretical VT, and calls VT_WAITACTIVE in them to look for changes. With this patch, ConsoleKit will instead be able to only monitor whatever it considers to be the current VT.

In today’s miscellaneous items: IDE and Network fixes from David Miller (including a migration to generic block layer request completetion on IDE for some legacy code, and some small fixes for the new ZigBee networking stack), Kmemleak fixes (Catalin Marinas – these are intended to reduce the “false positive” noise that Dave Jones and Ingo Molnar, and others, were seeing before), two “important” device-mapper fixes for 2.6.31-rc2 (Alasdair Kergon), a bunch of useful performance couter tools fixes from Arnaldo Carvalho de Melo (acme) adding filtering by comm, dso and symbol lists to perf (Paul Mackerras also posted a fix to enable counters only on next exec, allowing one to skip the launching process in profiling, and another allowing one to exclude any overhead caused by the presence of a hypervisor such as KVM). Ingo Molnar thinks Robin Getz’s CON_BOOT idea is “geninuely useful”. Finally, Amerigo Wang posted an update to kcore sizing calculation that should actually do the right thing this time around.

[new segment] In today’s blog postings: Dave Jones ponders aloud creating “rawhide”-like kernel builds for stable Fedora releases, to increase test coverage and (hopefully) make released kernels even better overall. If you’d like your blog included here, simply have it added to planet.kernel.org. If you do that, it’s assumed you don’t mind being featured in such things.

The latest kernel release is 2.6.31-rc1, which was released by Linus last week.

Andrew Morton posted an mm-of-the-moment for 2009-06-30-12-50. It contains the usual impossibly large number of patches to itemize here.

Stephen Rothwell posted a linux-next tree for June 30th. Since Monday, he adds a new tree entitled “sfi” (which was immediately dropped due to build problems), his fixes tree still contains that fbdev fix, and our old friend of the powerpc build configuration problem in an allyesconfig is back. There are a total of 132 sub-trees in the linux-next compose now.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

July 02, 2009 04:29 PM

Kernel Podcast: 2009/06/29 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090629.mp3

For Monday, June 29th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Boot consoles, HTC Dream, KSM, and Performance Counters.

Boot Consoles. Robin Getz (of the Blackfin project) posted an RFC allowing more than one “boot console”. Boot consoles are intended for early in boot, and rely upon an additional parameter of CON_BOOT passed to register_console. Existing Linux systems allow only one boot console at a time – the first time a call to register_console is performed without CON_BOOT, any boot console is silently unregistered. Robin’s patch allows multiple boot consoles.

HTC. The HTC Dream is one of the latest generation of Android phones. It features a 3.2″ screen, full qwerty keyboard, and many other features. Pavel Machek has been massaging getting various features into the upstream kernel, in collaboration with Brian Swetland (Google), and other folks. In his latest email, Pavel posts a driver of “higher quality than usual for staging/” to the staging tree, on the grounds that some cleanup is still required.

KSM. Kernel Shared Memory is a feature in which the kernel is able to scan physical page frames and reconcile duplicates into copy-on-write instances, saving physical memory in situations where the kernel would otherwise be happily oblivious to duplication. This is particular important for systems running large numbers of virtual machines. Various work has gone into design changes to KSM, the latest of which is another version of an madvise version of KSM from Hugh Dickins. Hugh says the patch is not ready for inclusion, but is intended for wider testing and exposure while it is being cleaned up. On a memory related tangent, Dan Magenheimer followed up to the original “Transcendent Memory for Linux” announcement with links to documentation. This is an effort to allow Linux to use flexible memory that might disappear at any moment from underneath it – for example for cacheing purposes.

Performance counters. In an ongoing discussion concerning measurement overhead imposed by the “perf” utility, Paul Mackerras came up with 4 possible solutions, one of which Ingo Molnar (who maintains “perf”) deemed “looks convincingly elegant”. Paul had suggested that “perf” call an execvp with an invalid NULL program name before turning on counters and re-execing with the actual progrma name so as to reduce PLT (symbol) resolver overhead.

In today’s miscellaneous items: David Howells and co are still working to figure out exactly hot the git commit that caused OOM situations did so (he has David Woodhouse running a system reconfigured with mem=1G – rather than its stock 4GB on which it would not run out of memory – for a reproducer, Yanmin Zhang announced a 16% regression in ffsb test cases on JBODs (Just a Bunch Of Disks – typically 13/12 disks in this particular case), a regression fix for x86 from Ingo Molnar, Liqin Chen posted a bunch of cleanups to S+Core, and Gregory Haskins posted fixes to his irqfd/eventfd implementation for use by virtual machines.

Finally today, James Dolan inquired as to the advantages (or disadvantages) of testing a kernel in a Virtual Machine. Overall, the response was positive, even if there are obvious issues testing without real hardware. This author certainly extensively uses KVM for testing quick kernel builds.

The latest kernel release is 2.6.31-rc1, which was released by Linus last week.

Stephen Rothwell posted a linux-next tree for June 29th. Since Friday, he added a new subtree entitled “percpu”, and his fixes tree contains a fix for fbdev. The majority of other trees lost build failures. There are now 131 trees in the linux-next compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

July 02, 2009 07:24 AM

Dave Airlie: radeon DDX has initial KMS support

So we've had an -ati DDX with KMS support in a branch for quite a while, but it was starting to grow into a very big mess, and some of the hacks in it were quite unmaintainable.

So started cleaning it up and pushing the bits to master.

Step 1 was adding macros in all the places in the accel code to abstract away the different command submission methods, and
add some ifs to do KMS specific things. Once this was done in theory the accel code wouldn't functionally regress and wouldn't require any more changes.

Step 2 was bringing over the kms and DRI2 support files from the branch.

Step 3 was making the decision between if (kms) if (!kms) blocks all over radeon_driver.c in all the various functions or having a nearly completely separate KMS/DRI2 driver file. The original code has the if approach and it was an unmaintainable nightmare, so I opted for approach 2 and it definitely is the best. At driver probe time in radeon_probe.c, I now do the KMS check if the pciaccess probe is called (kms without pciaccess is probably not going to matter). If I get KMS supported the driver picks a nearly completely different set of functions for PreInit/ScreenInit etc. I now have a separate radeon_kms.c file which has all the DDX interface in it. Of course if we have any changes they may need to be done in two places, but its a lot cleaner than it was in the other codebase.

I also ported the KMS code to use the libdrm_radeon buffer management code which is shared with mesa, instead of the DDX having its own buffer manager code base. This code is well tested via mesa, however it hasn't got all the features/optimisations that I've added to the DDX bufmgr over the last while.

So what's left:
Missing optimisation from old buffer manager:
1. buffer in VRAM? has the buffer ever explicitly been validated to VRAM, this allow for an optimisation on download from screen.
2. Download from screen, with driver pixmaps we don't know if the buffer is in VRAM or GTT, with (1) we can either blit if in VRAM, or just memcpy if in GART.
3. force a buffer to validate in GTT or force a buffer to stay in fast CPU access space - this was really useful some sw fallbacks where a buffer would end up in VRAM and then get used by the CPU from there. It probably only really takes the place of fixing EXA properly so the pixmap scoring is separate from the offscreen memory, and having a XA that works with driver pixmaps a lot better.
4. bugs and crashes I appear to be hitting a realloc crash at some point where glibc reenters itself and fails.

July 02, 2009 06:09 AM

July 01, 2009

Evgeniy Polyakov: Linux-mag on POHMELFS: From Russia with love

There is a new file distributed file system in the staging area of the 2.6.30 kernel called POHMELFS. Sporting better performance than classic NFS, it’s definitely worth a look.
...
Evgeniy Polyakov, a long time Linux hacker, has recently contributed a new distributed file system, called POHMELFS (Parallel Optimized Host Message Exchange Layered File System). It has appeared in the “chock-full-of-filesystems” kernel version 2.6.30 in the staging area. It is ready for testing and can give you a boost in performance (remember - it’s parallel!). This article will discuss POHMELFS and where it is headed.

An interesting article about POHMELFS, its state and future, namely elliptics network integration (with details on how it works and what to expect), NFS and its limitations.

Many thanks to Jon Smirl for the link.

July 01, 2009 02:55 PM

Evgeniy Polyakov: oLOLO-intellect

I did not blog about technical stuff for a while since read some articles and books here and there about related topics. To date I did not get enough ground to start development, but there is already a short todo list to be completed in a week or so (well, only first couple of tasks are supposed to be finished, I will see later how it will go, it is unfeasible to finish them all quickly).

So, how knowledge extraction and cognition development processes are about to be implemented with time:

To date I'm somewhere at the very beginning. Back to drawing board reading room...

July 01, 2009 02:12 PM

Evgeniy Polyakov: The scale

I have a good contact with at least 3 musicians who finished musical schools and college in the past, and at least one of them still plays (in the neighbour room in the office).
They all saw my miserable and ugly attempts to play some notes and no fscking one (although I have to admit that I rarely rape ears'play' on public) said that scales exercises made from different notes in different keys are so extremely cool!

And they are - scales just rock! Not because they sound good, they are pretty usual and boring (although they also can be played with some fun), but after one somewhat memorized fingering of one or another scale playing its notes in the order you imagined in the head sounds just great.

Of course there are some additional notes which sound great in given rhythm and tonality, but even sounds from the simplest C major scale (C-dur iirc) form a nice melody. Spent an hour today morning playing various bits, and would continue further if not needed to move to the office.

And although I would not call it a good playing or somewhat interesting for more or less good musician, I liked it alot. Decided to make further lessons playing that way - memorizing scales in different keys and then improvise some bits 'there'.

Given that I have some troubles understanding time duration, it is rather hard to play very interesting albeit simple melodies. Actually not to play but to 'create' (in quotes since those simple melodies unlikely to require somewhat serious efforts from me, so I do not count it as a creation :). And although I try to play with the metronome, what I jam on my own is rather far from those sharp duration edges...

Ordered a set of mutes from UK (thanks to Meph who will bring it to Russia in a month or so, since no internet shop wants to ship it) - the more I learn the more I like and want to play. So decided to jam when I have a mood, and looks like office is a really good place for this.
Getting my playing skills it is better (and safer for myself) to play so that no one really hears that :)

Got Yamaha Silent Brass system (to play in office, yeah), cup mute to play at home when time does not permit and a harmon mute (eventually I will be a great jazz player, although maybe not in this life).
In late August this all will be here, but even without I want to continue. And to enjoy. And I do. Hope you too.

July 01, 2009 12:00 PM

James Morris: All my talk slides are now on Slideshare

I've uploaded the slides from essentially all of the talks I've given to Slideshare. This is likely more useful than my previous strategy of dumping them in a directory and leaving the rest up to search engine bots.

Click here for the full list of slides. They are all published under the Creative Commons attribution share-alike license.

One interesting slide title, which I'd forgotten about, is Kernel Security for 2.8, from the 2004 Kernel Summit. This was from when we were still expecting a 2.7 development kernel leading to a 2.8 stable kernel -- I think Linus announced the change in development model at that summit.

Included in this set of slides are several introductory and deeper technical overviews of SELinux; I hope they are useful for people who are looking for information for themselves, or if making their own slides. As the license suggests, please feel free to copy and extend them (but note that the older ones are going to be more out of date).

July 01, 2009 03:58 AM

June 30, 2009

Pete Zaitcev: GIMP and the value of standard window controls

The individual who came up with the current "toolband" UI for GIMP should be made to use it on a netbook, while an eagle pecks on his liver. For crying out loud I'm having trouble placing windows on a 1440x900 screen.

June 30, 2009 07:05 PM

June 29, 2009

Dave Jones: Increasing testing of unreleased kernels.

This past weekend I’ve been thinking of reviving an idea that has come up countless times. Producing RPM builds of the rawhide kernel for our already released Fedoras. The reason for not doing them so far has come down to bandwidth. (in terms of build system throughput, disk space, mirroring, and people bandwidth).

What I’m toying with doing is some devel kernels for Fedora 11 that are built outside of the Fedora build system. The Fedora kernel team now has enough build bandwidth for x86-[64] that we can actually get builds for those architectures done faster than koji.

Disk space – I’m thinking of just keeping the last 2-3 builds available.

Mirroring – Instead of having these be part of Fedora proper, I think an external repo on something like fedorapeople.org will suffice.

Which just leaves people bandwidth. For the most part the work is going to be just regularly syncing the devel/ branch with a CVS branch of F-11/ For some of this work, some scripting could be done to alleviate some of the pain. Also the frequency at which we push out these builds will determine the pain point. Perhaps every -git isn’t particularly valuable anyway. One build every handful of -git’s should be sufficient for bisecting.

There does remain one additional barrier. Occasionally we introduce something in rawhide builds which just won’t work on F11. For example, the kernel modesetting patches are tied closely to Xorg packages. Sometimes upstream changes require changes in mkinitrd or udev or some other ‘plumbing’. Some of these are regressions, and hopefully by identifying them sooner we can get them reverted/fixed upstream. Sometimes however, things get deprecated, and we need to change these packages. I’m not sure how to cope with this yet in a devel-for-F11 scenario.

One other thing that might be fun to throw into this would be the generation of -vanilla packages. The only reasons we don’t do these as part of the regular kernel builds is the various bandwidth concerns above. The specfile copes with spitting out RPMs with very little work needed. Josh Boyer has been occasionally doing these builds, though there hasn’t been a huge uptake. It’s unclear if this is due to lack of interest, or just a lack of publicity.

Another question to be answered is whether we go the route of enabling debugging in all builds as we do in rawhide, or do separate -debug builds. I’m leaning towards the latter.

I’m not committing anything to this for sure just yet, but it’s something I’ve been giving quite a bit of thought. There are still a bunch of unanswered questions.

Post from: codemonkey.org.uk

No related posts.

June 29, 2009 09:58 PM

Pete Zaitcev: Rebuilding VLC in Rawhide

It looks like I have to rebuild VLC for F12 Rawhide, since it's holding up yum update and its dependencies are too expansive to be worked around with --skip-broken. I kept hoping that Nicolas would do his duty but apparently it was in wain, while everything is being built, even Pidgin. The downside of this is that I have better things to do than compensating for an AWOL maintainer, but an upside exists too. It's past due that I learned all the fancy tools Fedora grew over the years, such as Mock. Also, if I somehow work this into Koji, fruits of my labour will be available to everyone.

UPDATE: It turns out I need to be a registered developer at RPMfusion before I can use the tools, so I just built it from an src.rpm, and so I'm in posession of vlc-core-1.0.0-0.11rc3.fc12.x86_64.rpm. It was a rather painless process, and remember that Gentoo users do it every day before breakfast.

June 29, 2009 04:47 PM

James Morris: SELinux for Humans

I mean, SLUGs...

Paul Wayper gave a couple of talks on SELinux at this weeks' SLUG meeting, and includes links to a couple of very useful slide decks:


The sysadmin slides look particularly useful, as they focus on solving common issues such as running FTP/SAMBA/Apache servers, and provide some very useful general tips, such as looking in the audit log and using policy booleans for high-level policy tweaking.

These slides may be the best, short introduction for sysadmins on the topic that I've seen. It's a difficult thing to get right.

June 29, 2009 04:02 PM

Kernel Podcast: 2009/06/28 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090628.mp3

For the weekend of June 28th 2009, I’m Jon Masters with a summary of the weekend’s LKML traffic.

In today’s issue: kerneloops.org weekly report, DRBD, Kmemleak, OOM, performance counters, trying harder, and VFAT.

Kernel Oops reports for the week. Arjan van de Ven posted an analysis of last week’s kerneloops.org reports. He cited a mem_cgroup_add_lru_list list corruption as of concern (asking why this is new), a memcmp in the raid code, and the item he previously brought to attention (get_free_pages) concerning warnings on order > 0 allocations in the low level page allocator. Number one on the list this week was an i915_gem_set_tiling issue.

DRBD. Philipp Reisner reposted (for the first time in 2.6.31) concerning his highly available block device for HA clusters. He says “As the first bit of the DRBD patch already got upstream…it is time to get more of DRBD towards mainline”. He wants the LKML masses to consider the lru_cache next.

Kmemleak. Kmemleak, when enabled in the kernel build configuration, aims to detect runtime leakage of kernel memory. But it can be very noisy and it prints very verbose output, which a number of developers have objected to, including Ingo Molnar (who says he has lost crash information due to that). So various suggestions and patches are floating around to both trim the output, and the rate at which it is produced. One suggestion was also to be able to “watch” potentional leaked regions with some kind of registration interface.

OOM. David Howells spent a long time bi-secting kernels until he found the git commit that has been causing a marked increase in OOM situations. It was a patch from MinChan Kim entitled “vmscan: prevent shrinking of active anon lru list in case of no swap space”. There is an (incorrect) assumption that nr_swap_pages cannot be zero on systems with swap, which it can. So debate is now happening over the best way to fix the patch for systems with swap.

Performance counters. Jaswinder Singh Rajput posted a patch adding support to the “perf” utility for “multiple events in one shot”. He adds new options to display HARDWARE and SOFTWARE events using a command such as “perf state -w hw-events -e all-sw-events” wrapped around “ls” to display a number of stats for the running “ls” command.

Trying harder. Linus Torvalds replied to the ongoing get_page_from_freelist discussion concerning order > 0 GFP_NOFAIL allocations, in which David Rientjes had suggested a __GFP_WAIT allocation set the ALLOC_HARDER bit _if_ it repeats, saying that he “tends to like” the kind of “incrementally try harder” approaches to getting memory in such situations. In part because it ensures fairness – a new thread starting off won’t steal the page that an older thread has just had freed and really needs to grab right away.

VFAT. Andrew Trigell posted an updated CONFIG_VFAT_FS_DUALNAMES patch implementing a new config option. It is now possible to selectively configure whether a Linux system using VFAT will create both long and short (8.3) filename entries for long filenames – with this configuration option disabled, Linux will not create the compatibility short filename alternative on long filename entries. Andrew also posted an FAQ and announced that the Linux Foundation have arranged for John Lanza to serve as a patent attorney and answer legal questions that come up relating to this patch (he was copied on the email and is hopefully ready to handle a volume of LKML traffic).

In today’s miscellaneous items: TSC based udelay should have rdtsc_barrier (Venkatesh Pallipadi), a number of Intel Moorestown boot fixes (Jacob Jun Pan), 62 “remove semicolon” patches (Joe Perches), reposted lockdep DFS to BFS conversion patches (Tom Leiming – claims the implementation is simpler), fixes to the S+Core architecture (Arnd Bergmann), a stop_machine patch for very large CPU count machines suffering from severe cacheline contention (Robin Holt), a triple update from Ingo Molnar (x86, timers, and tracing), some EDAC AMD64 fixes (Borislav Petkov), and SPARC fixes (David Miller). Ben Herrenschmidt requested Linus pull some fixes originally intended for rc1 that weren’t ready in time because he got sick for a couple of days.

Finally today. Various people have been mentioning ext3/4 filesystem errors upon resume from suspend (especially on ATA devices). There is suspected to be a bug somewhere but it is proving fairly ellusive to track down.

In today’s announcements: dm-ioband version 1.12.0 (Ryo Tsuruta, disk bandwidth per partition control packages) and version 3.2g of the loop-AES file/swap crypto package.

The latest kernel release is 2.6.31-rc1, which was released by Linus last week.

Andrew Morton posted an mm-of-the-moment for 2009-06-25-15-49 which contains a number of updates against 2.6.31-rc1.

Stephen Rothwell posted a linux-next tree for June 26th. Since Thursday, it includes a fix for fbdev, and various subtrees gained build failures. The total tree count remains steady at 130 in the latest compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 29, 2009 03:46 PM

Dave Airlie: Kernel Conf Australia - ! all OSOL honest.

So there is a kernel conference on in Brisbane next month, being run by Sun.

Now when this was announced initially it was proposed as an all kernel hackery type get together for folks in the region, not matter which kernel they cared about.

(I did propose to talk about kernel graphics at this and got refused - so maybe I'm just being bitchy, however this is my blog).

20 speakers break down as follow:
12 Sun
1 Intel - Sun manager
1 RH
1 OpenBSD
1 FreeBSD
4 misc.

2 of the misc talks are in some way OSOL related,

the OpenBSD talk is about networking and pf, the RH talk is about security, FreeBSD about storage.

Now really if you aren't into OSOL or ZFS (3 slots OSOL FS related), why would you go. This conference is
local to me and I still couldn't justify paying the signup fee/taking the time to my manager at all. Now if
one of the main kernel and X.org hackers who lives in Brisbane can't be bothered to go, I do wonder
why anyone who isn't into OSOL kernels might be tempted.

There was talk of a .au unconference at one point, which maybe when the whole swine flu escapade is over might
actually be a useful meetup for the aussie open source community.

June 29, 2009 07:37 AM

Valerie Aurora: Common knowledge and the Cold War

I recently visited the National Museum of Nuclear Science and History - or, as I knew it growing up in Albuquerque, the Atomic Museum. The museum has a brand new full size building, with enough room to display most of their catalog for the first time, but not quite enough money to do so professionally. The result is a brief, magic window in which rare artifacts are finally out on display, but you can touch them and bang on them and crawl around in them. Many of the larger items, including a disassembled B-52 bomber and many rocket engines, are simply dumped in rows in a dirt courtyard in back.

Somehow, I expected that I would traipse through the museum, looking at old photographs and brushing up on my nuclear weapons trivia, with perhaps some solemn moments of reflection in front of the reproductions of Fat Man and Little Boy. Instead, I found myself oscillating between uncontrollable sobbing and open-mouthed technological awe. It went something like this: "Wow, a cyclotron! Holy crap, the Potsdam declaration. (Muffled sob.) A real nose-cone from an ICBM, cool! Whoa, photos of ground zero at Hiroshima. (Fountain of tears.)" I went back the next day to take some original photographs with the intention of writing a thoughtful, well-researched article on my personal experience.

Unfortunately, I have discovered that I seem to know almost nothing about the history of nuclear arms testing and development - and this is from someone whose parents worked on the Strategic Defense Initiative (Reagan's "Star Wars"), who read Richard Rhodes' "The Making of the Atomic Bomb" AND "The Making of the Hydrogen Bomb", who grew up in New Mexico, home of the Manhattan Project. More accurately, I knew some of the relevant facts, but in a vague sort of manner devoid of any connection with everyday life. They were numbers of megatons in a reference book, fictional movie plots involving lost nuclear weapons, and contrived acronyms for arms reduction treaties.

But walking through the museum, I saw brass Nazi goggles and notebooks, the car that carried the Trinity bomb to the test site, a copy of the Potsdam Declaration, movies of ordinary Japanese citizens clearing rubble with hand baskets in Hiroshima and Nagasaki, and the dented shells of nuclear missiles that were, for reals, lost in a midair collision over Spain and recovered after a multi-million dollar search. (Far more were lost and never found, in or over the ocean.) I saw, and touched, and yet still almost could not believe in, the outer shell of a "Davy Crockett" miniature tactical nuke - a literal "backpack nuke," small enough that I could encircle it in my arms. I thought backpack nukes were only a theoretical possibility, and yet they were manufactured, assembly line style. I was particularly struck by how heavily the shoulder straps of the backpack were padded - a consideration so practical and down-to-earth in the face of the incomprehensible horror of the weapon itself.

And then I really got myself in trouble: I bought a copy of Michael Light's 100 Suns from the book shop. It is a collection of 100 photographs of nuclear explosions from the U.S. nuclear testing program, during the time when nuclear tests were conducted above ground. I knew, intellectually, that Enewetak and Bikini Atolls had been practically obliterated by thermonuclear bomb tests, but seeing a 20"x26" color photograph of the fireball of a 11 megaton explosion is... entirely different. And entirely different than seeing it on the computer screen - the image below has nothing like the power of that in the book.


Castle Romeo test, Bikini Atoll, 1954, 11 megatons


Each photograph in this book symbolizes and encapsulates the conflicting and overpowering feelings I had in the museum: awe, excitement, and deep grief. My favorite photos are ones of the people watching the tests - most of them are bored, or matter-of-fact, but a few of the faces show the same awe and awareness that I feel when I look at the photos of the explosions, decades after the fact. The photos are accompanied by short footnotes at the end of the book, describing the technical and political circumstances and fallout (literal and figurative) of each test.

And here, yet again, I learned how little I knew: that several of the thermonuclear bombs accidentally exceeded expected yield by several megatons and accidentally sickened people (How!? can something as complex as a thermonuclear bomb go wrong - and result in even greater power?? That's not how computers work!), that we actually exploded nuclear weapons above the atmosphere and were surprised by the resultant EMP (I thought physicists predicted it, not that we knocked out Hawaii's power grid by accident and worked backwards from there), that the largest nuclear explosion ever was a 50-megaton test by the Soviets in the Arctic ("test" - it was entirely for political effect), that we exploded thermonuclear bombs in the continental U.S., that U.S. soldiers were put in trenches close to bomb tests in Nevada so that they could conduct maneuvers within a few hundred feet of the smoking, radioactive craters immediately afterwards.

It never even occurred to me that thousands (hundreds of thousands?) of people had witnessed nuclear tests and that I could go talk to one of these people and ask them what it was like. And I never would have guessed that I would be jealous of them, because more than likely, no human will ever witness a nuclear explosion first-hand ever again.

I don't know what to do now. Maybe most people already know these things, in which case it will be difficult to communicate my awe. Maybe they don't know these things, but I won't be able to cross the boundary between intellectual knowledge, like what I knew before I went to the museum, and the intense visceral awareness that the physical objects and photos gave me. Maybe I can't do better than Michael Light's magnificent book and I should just write him a positive Amazon review. Maybe I can do better, if I use all the resources available to me on this here World Wide Web.

Questions for you, dear reader:
Thank you for reading all the way through this.

June 29, 2009 06:05 AM

June 28, 2009

Pavel Machek: Weather forecasting

As you may know, 'interesting' weather hit czech republic. Heavy rains
followed by floods claiming lives. What is more interesting, the
weather forecasting went crazy, too. yr.no normally works pretty well,
but these days, it oscillates crazily as model is recomputed with new
data. (fridays forecast basically said 'saturday mostly nice with
light rain in the morning, sunday rainy; saturdays forecast says
'heavy rain in the evening, only light rain on sunday).

Now, forecasts got better. We used to use simple 'sunday rainy at 20C'
predictions, then medard-online came where you actually see data from
the model. Unfortunately browsing them is quite time consuming. yr.no
helps there: you select place and it shows you 3 day of prediction on
graph.

But it still lacks a lot: it only tells you expected values for the
predictions, and not the expected deviations (aka the ammount of
certainity in the prediction). "Easy" way to solve that would be to run the simulation few times, slightly varying input variables each time; then dispalying both mean values and deviations calculated.

....but I'm told that's not feasible, because weather forecast is already computationaly intensive, as is. OTOH, weather forecasting is already repeated, once every few hours, when new data become available. The solution may be as easy as displaying the "old" predictions, too: if they are similar to the "new" prediction, the prediction is probably reliable. If not, well...

This should be all easy to modify/check if weather modeling software was open source and did not require super computer... is there such beast? (I believe model running medard is opensource, but fortran and the supercomputer is probably needed... plus where to get the source data?)

On the similar wein... extremely short term forecasts (< 3 hours) should also be extremely reliable. I'd really like android to use its gps, then warn me if the rain is coming... Maybe it is as simple as predicting cloud motion from weather radar? Is there maybe similar software/service already?

June 28, 2009 08:56 AM

June 27, 2009

Pavel Machek: android improvements

Android market lists both free software and closed source as
'free'. What is worse, demo versions are market 'free', too. It would
be nice to use 'free', $0', 'demo', 'adware' categories, because they
are very, very different.

On a related note, I now have (slightly stripped down) 2.6.31-rc1 booting on Dream, along with keymap that actually makes it useful. (Unfortunately, I used Zaurus userland, and some init script remaps keys back. I did a bit of grepping, but did not yet have the time to identify the culprit).

June 27, 2009 11:12 PM

Kernel Podcast: 2009/06/25 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090625.mp3

For Thursday, June 25th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Futexes, kmemleak, and mixed endianness.

Futexes. Thomas Gleixner came to Linus, cap in hand, apologizing for the mixup over correct usage of get_user_pages (the forth argument is actually a number of pages and not a straight length – Peter Zjilstra previously posted fixes to the documentation of this function to avoid similar future mishaps). Thomas joked that he’s running out of brown paper bags to throwup into, and so is declaring all future futex bugs fixed by definition and thus features.

Kmemleak. Dave Jones reported a lot of (likely false positive) kmemleak reports on his latest Fedora (rawhide) test kernels. Catalin Marinas followed up with some suggestions for kernel config changes and a noise reducing patch, enabling task stacks scanning by default, which Dave confirmed he had similarly done for his test kernels in response to the noise.

Mixed endianness. Andrew Paprocki wondered aloud whether it is really possible to mix endianness within a process on IA64. According to the documentation which Andrew cited, IA64 uses a .be bit in the PSR (Processor Status Register) to switch from one mode to another, although other (kernel) documentation says that this is not preserved by the kernel upon return from system calls – so the system is always returned to little endian mode following a system call. The question was whether it was practical to wrap system calls with a switch back from little endian to big endian once again. Nobody has answered him, yet.

Miscellaneous updates include: cpufreq lockdep fixes (Venkatesh Pallipadi), some fixes to avoid various races in irqfd/eventfd (Gregory Haskins), the 12th iteration of the per-bdi writeback flusher threads (Jens Axboe), various IDE fixes (David Miller, many by way of the previous maintainer), and a TTM page pool allocator patch for allocating e.g. AGP buffers for graphic from Jerome Glisse, which looks to be more of an RFC at this point. Steven Rostedt ACKed the general availability of the ring_buffer independently of tracing code following its use by this author’s hwlat patches.

Finally today, Robert P J Day announced that he is running his existing cleanup scripts against the kernel CONFIG options, looking for orphans. He says that he has found a number that are mentioned in a Kconfig file but not in fact used in the kernel tree at this point.

In today’s announcements: Ksplice for Ubuntu 9.04 Jaunty. Local Cambridge resident and Ksplice, Inc. founder Jeff Arnold announced that his company has begun offering updates for Ubuntu 9.04. For those just tuning in, ksplice is a dynamic kernel patching infrastructure allowing for “rebootless kernel updates”. It can handle ABI changes, structure modifications, all the kinds of things one might expect, and it doesn’t (necessarily) require a special kernel to begin with since it does all of its work in loadable modules under a kernel stop_machine context. Ksplice includes a lot of very interesting technology and the “Uptrack” service for Ubuntu is aimed to generate interest in their other commercial rebootless update offerings for the “Enterprise” distros. usbutils 0.84. Greg Kroah-Hartman announced version 0.84 of usbutils. This release fixes several bugs.

The latest kernel release is 2.6.31-rc1, which was released by Linus on Wednesday evening PDT.

Stephen Rothwell posted a linux-next tree for June 25th. Since Wednesday, his fixes tree contains several commits, the tree still fails to build in an allyesconfig build configuration on PowerPC, and a number of conflicts and build failures were removed in time for 2.6.31-rc1.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 27, 2009 06:19 AM

Dave Jones: Book review: The tipping point

I’m not much a book reader. I often tend to start books, and never finish them. Occasionally I’ll pick them back up with the aim of finishing them, but end up starting over, and then abandoning them again at usually the same point. It’s actually been quite a while since I found a book that I couldn’t stop reading until I finished it. Recently, I found one such book: Malcolm Gladwell’s “The tipping point”.

The book explains why some ideas ‘tip’ or become really successful, while others never get off the ground. A lot of the book explained psychological reasons for why people behave the way we do in certain scenarios. From well known theories such as group think, to defining types of people, and the roles they play in the dissemination of ideas. I found Gladwells categorisation of people into connectors/mavens/salesmen intriguing, and spent a while thinking of various people I know and trying to categorise them accordingly. (It’s possible for someone to be in more than one category).

There’s also a lot of anecdotes in there to back up most of his points, coming from some wide and varied scenarios (The studies done by the Sesame Street researchers in order to create the perfect ’sticky’ educational TV show for example). In some cases they do drag on a bit. In some cases there are multiple examples where one would have sufficed, but overall, it wasn’t tedious whilst hammering home the point.

My only gripe with the book was that there was no mention of the opposite scenario. There’s lots of examples of successful ideas that ‘tipped’, and some ideas that didn’t, but there are no examples or dissection of “Why don’t bad ideas die?”. Some ideas no matter how many times they get shot down seem to bubble back up to the surface every so often. I have some of my own theories why these zombie ideas never go away, but I would have liked to have read the authors take on it..

Anyway, rambling… Good book. Recommended. Wikipedia also has a pretty decent summary of all the points covered in the book.

Post from: codemonkey.org.uk

No related posts.

June 27, 2009 04:49 AM

Harald Welte: Free Software Foundation Europe elects new senior leaders

A couple of days ago the FSFE has announced its new president, vice president and executive team. This marks a big milestone, since the former president, Georg Greve, has been the president for more than 8 years, ever since the FSFE was conceived.

As you can reed in Georgs blog, him stepping back as president has been announced at the assembly last year, giving the organization a full year to prepare and think about potential successors.

I want to thank Georg for his dedication and exceptional work during the last years. The FSFE has played a very vital role with regard to Free Software related issues on a political level in Europe during that time. Involvement with the WIPO, the Microsoft anti-trust trial, the software patent debate, just to mention a few highlights.

When the FSFE was started, I always hoped to find some time to get personally involved and to contribute to it - but it seems that my many technical projects as well as gpl-violations.org have been draining already more time than I had.

I wish the new team all the best and hope (and expect) the FSFE will continue to play an ever-increasing role in the political debate around Free Software related issues.

June 27, 2009 02:00 AM

June 26, 2009

Valerie Aurora: Smiling!

My friend Kristen complains that she can't get a good photo of me because I'm always making some kind of face. She's right, I read too much Calvin and Hobbes growing up and now automatically make a Calvin face for every photo. But I made a heroic effort last night and got this one:

I know what you are thinking - someone who wears that much jewelry can't possibly understand cryptographic hash functions! That's totally untrue - although I did have to throw out my copy of Applied Cryptography to make room for my earring collection.

June 26, 2009 09:55 PM

Evgeniy Polyakov: A lazy simple 6a day

And partially 6b. That's what I climbed today, although without much success - it was rather hot and stifling, which does not favour high activity.

But nevertheless I tried couple of 6a traces (one of them on-sight, another one I tried on previous training), and both failed, but they were simple falls, so I even do not count them as real ones - I did not find a hold on the on-sight trace, so fell to check out around, and second traces with a fall was at the very end, so I was already tired.
But still they were not clean traces. Also climbed on the interesting 6a+ on-sight, and again fell to check how the trace went at the very end. Tried 6b, and it was not completed cleanly also - I was not able to complete the key, which I consider a major fall.

So, it was not something special, but not bad either, so I like how it goes.

June 26, 2009 07:46 PM

Evgeniy Polyakov: Anniversary 'Parallels' concert


Parallels band

Had a great fun yesterday at Parallels band concert. They celebrated one year of concerts and roughly 6 years after group creation.

Many things changed from those times, music improved, sky got darker, green made dirtier. But hard & heavy are still there. And was fucking cool!

More in gallery.

June 26, 2009 01:02 PM

James Morris: Security subsystem changes in the 2.6.30 kernel

Here's an update on the major changes to the kernel security subsystem for the 2.6.30 kernel.


The remaining changes were primarily bugfixes and enhancements across most parts of the security subsystem, including SELinux, SMACK, and keys.

Paul and I are finalizing the schedule for the security microconf at the upcoming Linux Plumbers Conference. It's looking like a great line-up at this stage—stay tuned for more details soon.

June 26, 2009 06:35 AM

June 25, 2009

Matthew Garrett:

I've had hard drives die before, but this one seems to be doing it in an especially pathological way. It started with the kernel throwing irq timeout errors and then stepped up into actual read errors, culminating in a corrupt journal, a read-only block device and a forced fsck. It's been behaving a little better since then, though occasional io stalls (without any kernel error) suggest that it's having to repeatedly retry some sectors. SMART says it's all fine, so obviously I'm backing it all up now before a new disk arrives tomorrow and I can sort out access to the data centre. Email might be a bit spotty for me until then.

Turns out that running a 2.5" PATA drive for approximately 4 straight years may not have been the best of ideas. Who'd have guessed?

June 25, 2009 09:38 PM

Kernel Podcast: 2009/06/24 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090624.mp3

For Wednesday, June 24th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: NMI watchdog and NOHZ, upcoming kerneloops reports, the Simple Firmware Interface, Slow work module unload fixes, unevictable pages, and USB APIs.

The merge window is now closed, and it’s obvious from the drop-off in patches. Although as Stephen Rothwell noted, there are still a number of trees (14) in linux-next that need to be merged (or in American English, “punted”, to 2.6.32). As Stephen says, “please do not shoot the messenger.”

NMI watchdog and NOHZ. David Miller followed up (again) to his previously issue regarding nohz, this time arriving at the conclusion that little prevents the NMI firing if an interrupt storm arrives immediately after the call to tick_nohz_Stop_sched_tick(). Andi Kleen remarked that it would be safer to do tick disabling with interrupts off already, and that since the NMI watchdog is default off on most x86 systems, many people won’t have noticed.

Oops! The ever useful Arjan van de Ven posted a heads up regarding upcoming issues on kerneloops.org. Apparently, a lot of people have been bitten by an oops within get_page_from_freelist due to the changes to the VM intended to catch (and thoroughly blame) those calling kmalloc with __GFP_NOFAIL and order greater than zero. Unfortunately, this check turns out to be a little pedantic and there are many cases where legimate users need more than order 0 single page allocations returned (e.g. in the SLUB allocator). Linus Torvalds even followed up explaining how little difference there really is between order 0 and order 1, and how it shouldn’t be a big issue until you face allocations requesting order 3 or above. For now, it looks like order 1 and above is the magic number that the check will be updated to catch.

Simple Firmware Interface (SFI). Matthew Garrett followed up to his previous message concerning the SFI posting from the Intel camp. Previously, Matthew had objected to parts of the SFI concept – which is largely a cut-down ACPI – on the grounds that it tended to create a vendor mess of incompatible ACPI-like implementations with all manner of extension tables. Matthew feels that “SFI appears to be presented as a generic firmwae interface, but in reality it’s currently tightly wed to Moorestown [the Intel chipset] and I don’t see any way that that can be fixed without reinventing chunks of ACPI. I’m certainly not enthusiastic about seeing this present as a fait accompli in generic driver code”. One looks forward to the Linux Symposium “discussion”.

Slow work (if you can get it). Gregory Haskins posted a fix to the slow_work implementation, adding a module owner reference for module clients. Previously, the implementation did not have a means to ensure that slow-work threads had completely exited the text in question before it was yanked away by the module unload code.

Unevictable pages. Alok Kataria and Kamezawa Hiroyuki debated whether hugepages should be accounted as unevictable pages, and if so whether the name “unevictable” should be changed in procps output to “Pinned” or “Mlocked”. The problem is that, while hugepages are indeed unevictable, neither these nor the existing statistics fully account for every unevictable page present. Sometimes these aren’t known about even until vmscan tries to reclaim.

USB. There was some concern that lsusb was using an older API that stopped working unless CONFIG_EMBEDDED was set. This prompted several developers to question whether CONFIG_EMBEDDED should be required for “features” (it is intended to remove *unwanted* features), but Greg Kroah-Hartman explained that modern systems provide /dev/bus/usb and should be using that instead.

Miscellaneous updates include: A trivial update to the Intel TXT boot patches (Joseph Cihula, renaming a variable to make it global), IDE fixes (David Miller), Futex fixes (Thomas Gleixner, including the changes previously discussed to fault_in_user_writeable, fixing some incorrect assumptions in the previous patches regarding access_ok and whether a RW mapped region could go away under us), UWB (David Vrabel, trivial fixes), omapfb (Imre Deak, support for new LCDs and miscellaneous fixes), reducing the time taken for a single cpu online operation (Gautham R Shenoy, pseries), some networking updates (David Miller, mostly regular fixes), and a simplification of scripts/extract-ikconfig (Dick Streefland) removing the need for a special binary simply to extract a kernel config, which can be done in the bash script instead. David Airlie did try to post some drm-fixes, but it’s probably too late in the merge window at this point, as he noted.

Finally today, Mathieu Desnoyers asked about relicensing the marker LTTng, marker and tracepoints code under a dual license to include the lesser GPL license (LGPL v2.1). Although not objected to outright, some wondered why this was necessary, to which Mathieu responded that he wanted to allow userspace code that wanted to link to non-GPL code to still use the LTTng codebase.

The latest kernel release is 2.6.31-rc1, which was just released by Linus. Overall, Linus is extremely happy with how this merge window has gone. He adds, “On the whole? Tons of stuff. Let’s start testign and stabilizing.”

Stephen Rothwell posted a linux-next tree for June 24th. Since Tuesday, the fixes tree contains two commits fro fbdev and UML, the rr tree gained a conflict against Linus’ tree and the dwmw2-iommu tree lost its conflicts. The PowerPC tree still fails to build in an allyesconfig configuration.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 25, 2009 09:15 PM

Valerie Aurora: SHA-1 collision expected within a year

Bruce Schneier predicts we'll see the first SHA-1 hash collision within a year, based on recent cryptanalytic results:

Bruce Schneier: Ever Better Cryptanalytic Results Against SHA-1

In other words, systems relying on the lack of collisions in SHA-1 (such as BitTorrent) for correct operation will start having interesting bugs in the next year, as I predicted in 2003.

I updated The code monkey's guide to cryptographic hash functions (and moved it to my own web site). I also created a summary page of my writings on cryptographic hashes, including the most up-to-date version of the "Breakout Chart" of cryptographic hash life cycles.

[Humorous hyperbole deleted since humor + intertubes = fail.] I don't think it made sense to write this paper, since I don't think anyone changed their software as a result of reading it, and it didn't have a positive effect for me personally either.

[Added so the comments don't fill up with git-related flamewars.] Git and rsync are fine, as are any hash-based systems which only allow trusted users to add data to the system. (Trusted not to deliberately add colliding inputs, that is.) BitTorrent, Venti, CAS-based shared caches, and anything else which allows potentially malicious users to add data to the system is another story.

June 25, 2009 08:17 PM

Pete Zaitcev: Multi-process Firefox

Even if Google Chrome goes nowhere, it has done its duty by lighting the fire under the Mozilla's butt (as mentioned previously). Remember how Classpath forced Sun to open Java? Yep. Thank you, evil Google overlords.

June 25, 2009 06:09 PM

Rik van Riel: Downtime notification

This Friday, June 26th, we will be moving to our new house. This also means that kernel newbies, the passive spam block list and the other sites will be offline for most of the day.

June 25, 2009 12:38 AM

June 24, 2009

Evgeniy Polyakov: The final initial elliptics network release: 2.5.0

This is a major milestone in the elliptics network roadmap. System got full support of all essential operations needed for the fully self-contained distributed hash table storage creation.

Elliptics network is an object based distributed storage which supports different kinds of object replication, data deduplication, high-level file-based API and low-level object-based one. All logically complex parts are hidden behind provided API including failover connection processing, routing table maintenance, joining and synchronization protocols, merge strategies and IO itself.

Example applications contain a full-featured IO server and client capable of data replication and parallel reading and failover processing, system statistics gathering tool, notification receiver and history dump utility.

Elliptics server was built as a modular system which can have different storage IO backends, namely example server contains file IO storage (each transaction and history is stored in its own file, grouped into multiple directories), BerkeleyDB and Tokyo Cabinet database backends (each object and history are separate records in the appropriate databases).

This release brings us following features:

This major release partially moves elliptics network into bug-fixing mode. Partially because all new features (like long-waited distributed PAXOS-based locking) are supposed to be added as external projects, i.e. either linked as libraries or implemented as daemons with the appropriate protocol extensions. This is not a strict limitation though, and major changes can be made in the library itself.

TODO list includes:

I will start working on this features after a short delay (about a week or two), whilst it will be settled down and checked by the interested parties.

As usual, you can check a homepage, archive and GIT tree.
Enjoy!

And now I'm switching into something very different (to have some rest and dig into relaxing of handwaving, thinking and designing) - knowledge extraction and automatic state machine building (for the regexp parser for example, but generally for the deeper understanding of the FSM generation process).
Stay tuned - I expect a lot of interesting results!

June 24, 2009 07:57 PM

Matthew Garrett:

I spent last week in the US, during which I discovered that my preconceptions of Albany as a post-industrial wasteland with no redeeming features were incorrect (it's actually a post-industrial wasteland with some decent bars), made minor contributions (mostly in the form of pesto) to a prize winning chalk picture and cycled on the wrong side of the road for the first time in a ridiculous number of years.

But more relevantly, I spent most of the week trapped in Westford. One of the things I've been looking into lately is USB autosuspend, a kernel feature which allows devices to be powered down when they're not in use. This is especially interesting for USB, since it's a poll-based protocol. If the end device is powered up then bus traffic will be generated, even if everything's idle. USB autosuspend allows this to be avoided and thus saves a worthwhile amount of power.

However, drivers need to support USB autosuspend before it's useful. Further, devices need to be able to cope with it. Some older kernel releases had autosuspend enabled by default, leading to all kinds of fun as people discovered that their scanners and printers had stopped working. This was fixed by leaving autosuspend disabled by default for most hardware. The first component that I'm working on is support for drivers to indicate whether or not a given piece of hardware supports autosuspend, allowing it to be enabled by default for that kernel. Handling this at a per-driver level means we shouldn't end up with obnoxious white or blacklists. The second is adding support for autosuspend to more drivers. Right now I'm concentrating on hardware that's common in laptops. There's support for autosuspend in the qcserial and uvc drivers, and some experimental (and unmerged) patches for bluetooth and some other modem drivers. Palm have just released their kernel code, which includes autosuspend support for the cdc-acm driver. I'm working on cleaning these up and getting them into a mergable state, and part of what I was doing in Westford was testing that various pieces of hardware work correctly. The good news is that they seem to, and with a bit of luck we'll try shipping support for a lot of this in F12. If that doesn't end up causing real world problems then it ought to be safe to merge it all to mainline.

The final component of this is handling autosuspend on devices with userspace drivers. Our only real choice there is for packages to ship udev fragments that enable it for working hardware. I've added support to the fprint package in rawhide, which means that fingerprint readers should now be powered down unless they're opened. Nobody seems to have complained yet, so I'm cautiously optimistic that this can be sent upstream without any problems.

June 24, 2009 02:07 PM

Kernel Podcast: 2009/06/23 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090623.mp3

Do you pine for the days when men were men and wrote their own device drivers? Well, do you, punk?

For Tuesday, June 23rd 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: the continuing 2.6.31 merge window, IDE, Intel Trusted Execution Technology support, RCU, SFI, System Call tracepoints, and removing perl from the kernel build process.

The Continuing 2.6.31 merge window

per-bdi writeback threads. Jens Axboe wondered what was going to be done about his per-bdi writeback patch series, which are apparently “looking good” and have been in linux-next for almost a week without any problem reports. Jens wasn’t the only person wondering where his patches went. K. Prasad mailed to ask what was the plan with regard to his Hardware Breakpoint Interfaces, especially considering that (apparently), most of the previous concerns from Ingo Molnar and others have now been addressed in -tip.

Architecture updates include: Super-H from Paul Mundt (mostly SMP fixups), blackfin (Mike Frysinger, including a fair number of fixes from himself), and the new S+core architecture that was mentioned in this podcast previously. This ARM-reminiscent architecture is living in Arnd Bergmann’s tree at the moment while it’s author (Liqin Chen, who seems to be doing a great job figuring out Linux kernel development, including being the first user of Arnd’s new asm-generic defined ABI) figures out where to keep the 200kb tree. Currently, S+core runs LTP and a limited userland, but not against this tree.

Miscellaneous updates include: backlight updates (Richard Purdie, including a trivial kmalloc fix), LED driver support (also Richard Purdie, essentially bug fixes but also one new driver), infiniband (Roland Dreier), SCSI updates (James Bottomley, mostly a small set of driver updates and fixes), asm-generic fixes (Arnd Bergmann, whose tree is also hosting a new architecture port), fixes to watchdog (Wim Van Sebroeck), libata updates (Jeff Garzik), run-time power management of IO devices (Rafael J. Wysocki), and Kprobes jump optimization (replacing int3 breakpoints on x86 with jumps). For those interested in a discussion of git usage in kernel development, take a look at Linus’ replies to the “fix for shared flat binary format in 2.6.30″ thread.

Non-merge specific concerns

IDE. Following yesterday’s rather abrupt announcement that David Miller would be taking over IDE maintainership, Tuesday brought a number of clarifications. First, Bart and David showed public support for one-another, with Bart saying he would have more time for “other projects”, and David explaining that this was all amicably done (so nobody need worry about the event – the sky is not falling and we can move on with our lives). Secondly, David posted a new patchwork location for those wishing to track ide patches in the future.

Intel Trusted Execution Technology support. Joseph Cihula posted version 5 of a patchset implementing Trusted Execution Technology support for Linux. As I have previously discussed, this patch series is intended to safeguard against a compromised system via ensuring all of the elements in the boot path are secure from attack. The technology aims to verify that the bootloader is secure, which then verifies the kernel, and so forth. It is implemented in the form of a tboot “kernel” loaded by the bootloader that sets up the dynamic root of trust via a special GETSEC[SENTER] processor instruction and then causes the real kernel to be loaded, after it has been verified.

RCU. Paul McKenny posted a -tip proof of concept version of RCU designed for non-SMP, embedded systems, aiming to be small in footprint (as little as a quarter the size in memory use terms as other RCU options, according to the benchmarks that he attached to the posting). In addition to Paul’s patches, Jesper Dangaard Brouer posted a 10 part patch series aiming to ensure correct usage of rcu_barrier on module unload. Let’s remind ourselves that this issue was “discovered” last week and so far has resulted in a few fixes to David Miller’s net tree, with more to follow, including these patches.

SFI. Len Brown posted a patch series implementing a new “Simple Firmware Interface” (for which a talk is forthcoming at next month’s Linux Symposium), which seems to be a simplified version of ACPI curretly with a single chipset implementation. While the idea is certainly interesting, Matthew Garrett was concerned that having essentially another ACPI (sub)implementation in the kernel was setting a precident for more to follow. He prefered codebase sharing as the starting point, allowing for other sub-ACPI variants.

System call tracepoints. Jason Baron posted version 2 of his patch series implementing system call tracepoints. It includes the ability to toggle entry/exit tracing of each system call via the usual events/syscalls/syscall_blah/enable type interface. Since the previous version, Jason has added a number of fixes (locking, static allocation, etc), including support for system calls that take no argument.

Finally today, Rob Landley posted a three part patch series removing the use of perl from the 2.6.30 build, and replacing the offending perl script (kernel/timeconst.pl) with a much shorter (a quarter of the size) shell script that does the same thing. Separately, Benjamin Herrenschmidt continued the good fight figuring out how to make early SLAB initialization work on PowerPC. Amongst his findings was a need to move cpu_hotplug_init early enough, to which Linus responded that this could just be a statically initialized.

In today’s announcements: RT version 2.6.29.5-rt22. Thomas Gleixner announced version 2.6.29.5-rt22 of the -rt patchset. The announcement contains three kinds of fixes – a network live lock fix, disabling preemption over the atomic section of iomap, and identifying false positivies in softirq pending check (caused by a CPU going idle with the softirq pending bit of a blocked softirq thread still set).

The latest kernel release is 2.6.30, which was released by Linus on June 9th.

Stephen Rothwell posted a linux-next tree for June 23rd. Since Monday, he added a fix for an fbdev exposed compiler bug, the slab tree lost its build conflict, and yes, the powerpc tree continues to fail in an allyesconfig build configuration. The total sub-tree count remains steady at 130 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 24, 2009 08:08 AM

Valerie Aurora: Detecting fraud with math

Did anyone else watch "Mathnet" growing up? It was a show in which detectives used elementary mathematics to solve crimes. Fun, but silly - or so I thought.

As I noted in my Made-up-ness Quotient post, people are bad at making up numbers - that is, when they make up numbers, they often don't match obvious statistical patterns that appear in real numbers. Turns out that the people in charge of making up numbers for the election in Iran were also bad at it:

The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.
-- The Devil Is in the Digits, Washington Post

Note to future election-stealers: Hire a statistician!

June 24, 2009 03:42 AM

James Morris: SELinux Developer Summit: CfP closes 1st July (1 week)

Just a reminder for SELinux developers and anyone interested in the internals of SELinux that the SELinux Developer Summit CfP closes on July 1st, which is about a week away.

SELinux logo


Details of the CfP are here. Don't forget to join the event mailing list if you're attending.

Proposals for presentations, lightning talks, and development sessions should be submitted via email per the instructions in the CfP. Proposals do not need to be especially detailed: if you have a good idea, simply send it in.

mystery object


For reading this, you are rewarded with a mystery object (pictured above). See if you can figure out what it is before clicking on it and reading the comments @ flickr.

June 24, 2009 02:21 AM

June 23, 2009

Matthew Garrett:

It's now been over 6 months since Poulsbo hardware with Intel's GMA500 graphics core started shipping in volume. And we're still utterly lacking in any sort of worthwhile driver. It's an impressive turnaround from the recent days when the straightforward recommendation for mobile Linux hardware was "anything that has lots of Intel stuff in lspci", and while the Poulsbo situation in itself doesn't change that hugely it's potentially symptomatic of a worrying trend within parts of Intel.

The first thing to realise here is that, like most large companies, Intel consists of a large number of business units with different priorities. Their open-source technology center has historically had responsibility for providing Linux support for hardware, but this obviously depends on other business units cooperating with them. And there's strong evidence that many of those business units don't get it.

There's been signs of this for some time. Back before the days of the Intel X.org driver gaining native modesetting support, some people ran the Intel embedded graphics driver. This was (is?) a closed X driver that was able to provide native modesetting on platforms that could only otherwise be run at incorrect resolutions. One business unit was shipping a driver that was more functional than the official Intel Linux driver. To the best of my knowledge, none of that code was ever used in the rewritten Intel driver that now provides the same features.

Poulsbo is another example of this. Intel wanted a low-power mobile graphics chipset and chose to buy in a 3D core from an external vendor. IP issues prevent them from releasing any significant information about that 3D core, so the driver remains closed source. The implication is pretty clear - whichever section of Intel was responsible for the design of Poulsbo presumably had "Linux support" as a necessary feature, but didn't think "Open driver" was a required part of that. There's not a lot any other body inside Intel can do once IP-limiting contracts are signed and the hardware's shipping, but it ends up tarnishing the good reputation that other parts of Intel have built up anyway.

And while Poulsbo is the most obvious example of this to date, it's not the only one. Intel recently decided to make the EFI development kit discussion lists private. Various drivers for Moorestown (the followup platform to Poulsbo) have been submitted to the Linux kernel, and while they have the advantage of being GPLed they have the disadvantage of being barely above the level of typical vendor code. Objections that chunks of them simply don't integrate into Linux correctly has done little to get these problems fixed - I still have no real idea how the runtime interface to power management on the SD driver is supposed to be used, but I suspect the answer is probably "badly".

This all makes sense if you assume that there are large groups of people in Intel who don't talk to each other. But to the casual observer it just looks schizophrenic. Explaining to an irate user that the Intel who shipped a closed Linux graphics driver is only barely the same Intel who contribute so much to architectural improvements in the Linux graphics stack doesn't make their hardware work. And while all of this confusion is going on, Intel's competitors are catching up. Atheros are now making significant contributions to the state of Linux wireless. AMD are releasing graphics chipset documentation faster than Intel, and radeon support is improving rapidly.

Is the future going to be one where we can no longer simply say that Intel hardware will Just Work? Is their work on Moblin (easily the most compelling Linux UI for netbooks) going to be wasted on the broader Linux community because it'll mostly end up running on hardware that's not supported by the mainline Linux kernel? Does Intel have a real commitment to open source, or is that being lost in the face of short-term requirements?

Intel need to demonstrate that they have a company-wide understanding of what Linux support actually means or risk losing much of what they've earned over the past few years. I'm desperately hoping that Poulsbo and what we've seen so far of Moorestown are the exception, not the future norm.

June 23, 2009 08:41 PM

Evgeniy Polyakov: Lazy falls

Climbing training was rather uncool - I did not finish any trace without falls.
Started with the quite complex beginning on the negative horizontal balcony - did not complete.
Moved to really complex 6c+ with the negative horizontal balcony and full of very passive holds - fell.
Then tried 6c over crack-in-the-rock holds - fell.
Tried new 6a (with the same balcony though) - again (although I think I will complete it next time without falls if started at the beginning of the training).

Started to make pull-up exercises on the holds (I selected rather deep ones and used big finger to help 2/3 of the time though) - succeeded. Although something tells me that when I started to pull up without full arm straightening (they are quite close to 180 degress in the elbow, but a bit less), it is not very fair. It looks like blood does not circulate well when arms are not yet fully unbend and it prevents them from quickly become clugeed up, so I'm able to make several pulls really quickly and without much efforts.

But anyway, even though the training was lazy and without excellent success stories, it was a good one.

June 23, 2009 07:39 PM

Evgeniy Polyakov: New elliptics network features

In a meantime I implemented the last two features scheduled for the new release:

The former was rather trivial patch, but its consequences are vital for the correct distributed storage functioning. Now every joining node which has objects with IDs which fall out of the node's maintained range will be sent into the network and stored on the other sites. In particular this is useful for the cases, when network failure splits the whole network into parts. Previously joining node only copied data from other servers, but did not announce its own, which were outside of its ID range.

Object removal is a rather straightforward feature obviously needed for the system in general. But there are tricky places. Namely transaction support in the elliptics network implies that transaction with the same content will not be stored twice even if it was written into two or more different objects. Instead single object's history, corresponding to transaction content, will be updated to reflect that it is now shared by different objects.
Thus we can not simply erase transaction if it belongs to some objects to be deleted, instead we just drop a reference from the appropriate transaction history and only remove transaction when there are no external objects referencing it. This requires fair amount of testing, and another automatic test.
Also it is only implemented for the file IO storage backend now, and although it was made quite generic, BerkeleyDB and Tokyo Cabinet IO backends need to be updated.

Exported removal API contains two functions - high-level function to remove a file (the same way as read and write files from the local storage into/from the network), which blocks until completion and removes all file copies according to set of registered transformation functions; and low-level function to remove single object, which has a callback invoked when node acks transaction removal. The latter function will block waiting for acknowledge if no callback was specified.

Next task related to elliptics network will be a POHMELFS port. Then distributed locks.

But before that... Some shiny new ideas to think about in a very different area. And I expect it to be extremely interesting. Stay tuned!

June 23, 2009 02:22 PM

Kernel Podcast: 2009/06/22 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090622.mp3

Would you be prepared if gravity reversed itself? The only thing I can’t figure out is how to keep the change in my pockets.

For Monday, June 22nd 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: the continuing 2.6.31 merge window, the IDE tree, documenting the rc-series and the merge window, taking a core dump, kernel boot delay madness, IO scheduler based IO controllers, and some feedback.

The Continuing 2.6.31 merge window

Device Mapper. Alasdair Kergon posted a series of device mapper patches. These are mostly a consolidation of various fixes, including ioctl support cookies for udev, and some documentation updates.

Firewire. Stefan Richter posted a series of firewire (IEEE1394) update. In this posting, Stefan notes that the “new” stack is now preferable over the legacy one, except in the case of audio devices (for which he notes that it is possible for distributions to package both stacks in their releases).

Networking. David Miller posted a series of updates to the networking stack (including an indirect reminder that John Linville is away at the Wireless summit at the LinuxTAG in Berlin this week). Amongst the updates were lots of fixes, and he still expects some netfilter regression fixes to follow. Separately, David wondered aloud how the NMI watchdog and NOHZ might interact badly if a system were truly idle (triggering the NMI watchdog unncessarily), but ultimately convinced himself that he was looking for another SPARC cause.

NFS private namespaces. Tond Myklebust reposted his private namespaces for NFS series of patches. When these are applied, the kernel gains the ability to create a private mount namespace that is not visible to user processes. These were originally targeted for 2.6.31 and in the absence of objections, Trond is hopeful that they will be ACKed and accepted forthwith.

PCI. Jesse Barnes posted an updated PCI git repository. He points out that the latest updates are “much less aggressive” than those targeting 2.6.30, although he noted that the latest tree does include AER (Advanced Error Reporting) enhancements, especially on multiple error conditions.

Performance counters. Ingo Molnar responded with a series of replies to a long series of replies from Stephane Eranian concerning Ingo’s posting of Performance counters patches for the merge window. There were many different comments here, but they showed a difference in opinion between the various potential users for performance counters. Ingo states that his main concern is making tools (such as perf) a “useful solution to developers/users”, which is “a key area where…perfcounters and perfmon differs”. He also notes that it aims to be “‘Oprofile done right’ and’pfmon done right’”. The thread makes for some interesting reading if you are interested in performance counters (and, honestly, who listening to a podcast such as this one wouldn’t be?).

Architecture updates include: s390 (Martin Schwidefsky).

Miscellaneous updates include: irqfd/eventfd patches from Gregory Haskins, suppressing page allocator warnings about order >= MAX_ORDER when the code causing this is doing the right thing and intentionally gets the warning (by adding a new __GFP_NOWARN kmalloc flag), exofs/osd tree updates from Boaz Harrosh, and a rather interesting one from Krzysztof Mazur noting that arch_get_unmmaped_area() in the generic core doesn’t correctly ensure that the address it returns is greated than TASK_UNMAPPED_BASE.

Non-merge specific concerns

The IDE tree. David Miller and Bartlomiej Zolnierkiewicz had a “discussion” in which David expressed some frustration about the lack of testing of various bug fixes, to which Bartlomiej suggested that David might take over IDE, which David offered to do on the spot. He posted a new IDE tree address, and various folks sent well wishing mails. But Linus wasn’t so keen on the situation, saying that he really didn’t want to take the tree David Miller had put together. Quoting Linus, “I really don’t want to take this. I think you [David] and Bartlomiej should spend a _lot_ more time and effort trying to resolve this. Me taking it just closes the doors fro trying to be constructive about issues.” There followed a debate about the current users of ide vs. pata. and libata, and why more people don’t just move away from legacy ide code. Arnd Bergmann pointed out that a number of architectures (especially those without dma-mapping.h support, often true for the NOMMU architectures) can’t use libata at all.

Documenting the rc-series and merge window. Luis R. Rodriguez reposted his quite excellent documentation on the rc-series and merge window process. In the latest version, he adds the average time between the last ten releases (86.0 days currently).

Taking a core dump. Neil Horman posted an interesting little patch that aims to fix three deficiencies in the current core dumping code. Firstly, he fixes recursive dump handling (where the dump handler specified in core_pattern actually crashes while it is helping us to take the full dump). Secondly, Neil allows the core_pattern process to complete, waiting for it in case it wants to poke at the procfs entries for the crashee process. Finally, he adds a brand new sysctl called core_pipe_limit that bounds parallel core dumps.

Kernel boot delay madness. David Miller objected to a patch from Simon Arlot adding yet another boot parameter to the kernel, this time to obviate a (possible 2 seconds in duration) reset delay for physical network PHYs that have already been initialized on boot. David objected that “this is getting out of control” (refering to the boot delay parameter craziness), adding “We’re not going to add a hundred different obscure module options to eliminate delays and device resets”.

IO scheduler based IO controllers. Vivek Goyal followed up to his Friday posting concerning the latest iteration of his IO scheduler IO controller, noting that he had not done testing with AIO (Asynchronous Input Output). A dialog ensued between Vivek and Jeff Moyer over the best options to use for benchmarking to ensure that DIRECT IO was also being requested.

Finally today, thanks for the feedback on this podcast. It really means a lot to me that I’m providing something of some value to the community, and having a little fun in the process, especially at 4am on a Sunday morning. Do drop me a line and let me know what you think! If you’d be willing to record a few words about what you work on for me during the OLS or Plumbers conference, please do let me know, or just find me at the event and we’ll hook it up.

In today’s announcements: git version 1.6.3.3. Junio C Hamano announced git version 1.6.3.3, which includes fixes for cygwin, memory leaks, and a number of others fixes.

The latest kernel release is 2.6.30, which was released by Linus on June 9th.

Stephen Rothwell posted a linux-next tree for June 22nd. Since the previous day, two new trees were added for davinci and my hwlat hardware latency detector. Stephen also pulled in the “new” ide tree, although that might change if the discussion is revived following Linus’ comments. Today’s tree is moslty tree of conflicts, and yes, powerpc still fails to build in an allyesconfig build configuration. The total sub-tree count is now up to 130 trees, due the addition of the two aforementioned new sub-trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 23, 2009 06:11 AM

Harald Welte: GSM wardriving has started

As you can see in this picture referenced by this blog post, somebody is having real fun using the BS-11 and OpenBSC for GSM wardriving.

Please note that the BS-11 is a 200W AC powered device, so you need the entire trunk full of lead batteries and a reasonably sized UPS to provide it with power.

There are much lighter setups using a laptop and a nanoBTS, but then those setups are likely a factor 10 more expensive (and provide less RF power).

But what this all tells us: GSM wardriving has started. More security researchers are looking into GSM security than a year ago, much due to the successful growth of a community around OpenBSC. Many people are only starting with GSM and mainly using/playing with the software, the number of actual contributers to the code is still small...

On a larger scale, you can see that GSM insecurity is finally going to become a much more popular topic, with more people able to demo the various long-known issues such as lack of mutual authentication and insufficient threat models/analysis during protocol design.

June 23, 2009 02:00 AM

June 22, 2009

Jens Axboe: Cheaper SSD reliability, continued

So after twice promising me to get info from 'an engineer' in a 10 day time span, I pushed OCZ again today. The answer is that it's likely "bad blocks" on the drive and they offered to exchange it. Now, I don't know the internal secret sauce to their flash chip striping, but a bad flash block may explain the issue. And personally I care a lot less about this specific drive than the larger issue at hand, which is: Can we trust these SSD drives? From this experience, the answer so far is unequivocally no. If the flash block/page/erase block is indeed part (or partially bad), I want to know! Don't just send me all ones in the data, that's not acceptable. If their drive/fw doesn't error handling, I'm quite sure the customers would like to know this fact.

Apparently Indilinx does the firmware for these drives for at least two manufacturers. Given that the SSD consumer market is steaming ahead at this point, I'm guessing there's a huge rush to reduce the time to market. A safe bet would  be that the firmware is perhaps a little too rushed in this case. Coincidentally, the bad drive is running fw 1.10. Version 1.30 lists this little juicy fix among the others: "Read fail handling".  I'll try 1.30 on this drive and re-read the data, just to see what happens.

For the time being, I can't recommend using Indilinx based drives anywhere except for throw away data. If they can't even tell you when the data has gone bad, then they really can't be used for much else. At least use btrfs with data checksums enabled, then you could catch a problem like this. Yes I did run btrfs on my Vertex, and yes I did disabled datacsum to avoid the extra CPU use on my laptop... My personal recommendation would be to stick with Intel or Samsung SSD drives where data matters.

June 22, 2009 07:35 PM

Rik van Riel: Moving house

We're moving to a new house this coming Friday and Saturday. Having learned from our last move, we started packing a few weeks ago and have almost run out of things to pack. Also, the garage grew a mountain of moving boxes:

moving boxesmoving boxes

June 22, 2009 04:23 PM

Kernel Podcast: 2009/06/21 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090621.mp3

This podcast is brought to you in part by way too many California strawberries.

For the weekend of June 21st 2009, I’m Jon Masters with a summary of the weekend’s LKML traffic.

In today’s issue: The continuing 2.6.31 merge window, the “Ceph” distributed filesystem, IO scheduler based IO controllers, poisonous hardware, transcedent memory, and ksplice tainting.

The continuing 2.6.31 merge window

Core kernel. Ingo posted a few updates to the core kernel. Amongst these was a bugfix developed in collaboration with Thomas that included a new function named get_user_writeable for use by the futex code (which can’t rely upon the existing access_ok for private futexes). A dialog ensued between Linus, Ingo Molnar and Thomas Gleixner concerning use of get_user_pages_fast() in this code, which Linus pointed out could be replaced with a single instruction on Intel-esque systems at any rate.

DRM. Dave Airlie posted a final drm tree for 2.6.31. Amongst the major changes was a switch in the AGP code to use arrays of pages instead of arrays of unsigned long. Quoting Dave, “since pageattr grew patch array interfaces this is possible and should solve GEM on PAE issues”.

KVM Support for 1GB pages. Joerg Roedel posted version 3 of a patch series that gives KVM the ability to support 1GB pages. This relies upon nested paging support, a feature of modern CPUs which behaves very similarly to an additional level in the global page table hierarchy. The patch series relies upon exporting vma_kernel_pagsize to modules.

Per-cpu. Ingo Molnar responded to yesterday’s “percpu for 2.6.31″ pull request posted by Tejun Heo (that had gotten slightly warped in the posting and caused Linus to be slightly unhappy), pleading with Linus and company to reconsider taking the per-cpu changes due to the fact that the patches had been posted in a timely fashion, and the sheer amount of work Tejun will be committed to if he must maintain them for yet another cycle (170 files worth of changes).

Performance counters. Paul Mackerras noted that architectures like PowerPC64 define __u64 to be unsigned long rather than unsigned long long, which causes compiler warnings every time one prints such a value with the print format string of %Lx. To correct this, Paul posted a patch to these userspace tools providing their own implementation of the definition of types such as u64.

RCU. Paul E. McKenney posted version 8 of his “big hammer” expidited RCU grace periods patchset. This patchset uses the existing per-CPU migration kthreads, which are awakened in a loop and waited for in a second loop, in order to expidite the passage of an RCU grace period. Apparently, this patchset can reduce RCU grace periods to 40us on an 8-CPU POWER machine.

Syscall tracepoints. While it is yet to be decided exactly when Jason Barron’s proposed syscall tracepoints will make it in, Li Zefan did use the opporunity to discover a bug in seqfile handling in the kernel trace infrastructure for which he posted a series of patches.

David Miller noted that stack backtrace support had broken sometime in the past day or so, which Stephen Rothwell was already aware of. Stephen forwarded a patch from Mike Frysinger that fixed it, which was also good news for Ingo.

Miscellaneous updates include: MMC updates (Pierre Ossman), Cryptography (Herbert Xu), ALSA (Takashi Iwai), NFS (Trond Myklebust, including support for version 4.1 of the NFS standard), Watchdog (part 2, apologies for not having space to mention part 1 yesterday), the usual level of tree posting insanity from Ingo (IRQs, scheduler – including another attempt to hide runqueues from those that would poke at them, timers, tracing, and x86), IDE (Bartlomiej Zolnierkiewicz), input updates (Dmitry Torokhov) and some kbuild fixes from Sam Ravnborg.

Architecture updates include: PowerPC (Benjamin Herrenschmidt), Blackfin (Mike Frysinger), and Microblaze (fixing a build problem caused by the previous round of Microblaze architectural updates).

Non-merge specific concerns

Ceph distributed filesystem client. Sage Weil posted a 21 part patch series implementing a “Ceph” distributed filesystem client, in the staging tree. “Ceph” is apparently a distributed filesystem designed for reliability, scalability, and performance, which relies on btrfs underneath. It features the usual kinds of things – data replication, no single points of failure, and fast recovery from node failures, although the fact that it’s only just going into the “staging” tree obviously means you shouldn’t rely on this client for critical stuff at this point. Separately, Greg posted a large number of changes to Linus for the “staging” tree (and by large, we mean 658 files changed, 165585 insertions, and 240493 deletions). Quoting Greg, “We are removing more crap than we are adding, looks like progress to me!”.

IO Scheduler based IO Controller. Vivek Goyal posted version 5 of his IO scheduler IO controller patchset. This patchset aims to introduce an ability to assign and control IO bandwidth consumed by tasks through IO throttling. A number of additional changes have been made since version 4, but this are mostly fixes and it looks like the patchset is stabilizing now.

Poisonous Hardware. Fengguang Wu posted version 6 of his HWPOISON patchset. This version has many of the changes discussed previously in this podcast. Included amongst those are the switched default to “late” kill except for those processes that have specificially requested an “early” kill via a per-process tunable option, as proposed by Nick Piggin and Hugh Dickens. Other changes include killing off the “uevent” emission idea, tainting the kernel on posioned page detection, and not “mess”ing with dirty/writeback pages for now.

Transcendent memory (”tmem”). Dan Magenheimer posted a 4 part patch series (first as an email attachment, then as a normal series), implementing what he described as “tmem” for Linux. Essentially, this is support for transient memory of a “dynamically variable size”, addressable only indirectly by the kernel, and which might disappear without warning. It may seem (on the face of it) to have little utility, but the application is in virtual machines (or other non-virtualized environments, including hotplug memory, SSDs, page cache compression, and even highmem on non-highmem kernels and using space VRAM) being provided with memory for cacheing (and similar purposes) that might be taken away at any moment without any warning. Since it requires kernel assistance, it’s application is mostly for in-kernel caches. The patch series is fairly comprehensive, and there will be a talk on the design on the first day of the 2009 Linux Symposium in Montreal, Canada.

Finally today, the ksplice guys requested a new TAINT flag so those loading ksplice updates into their kernels would be able to detect this easily (especially vendors of those concerned). Peter Zjilstra objected on the grounds that ksplice isn’t upstream, although it does still seem (to this author) that it would be a worthwhile thing to have in mainline anyway.

The latest kernel release is 2.6.30, which was released by Linus on June 9th.

Stephen Rothwell posted a linux-next tree for June 19th. Stephen added one fix (for symbol checking, affecting ARM), and noted that Linus tree gained a build failure due to a compiler bug (for which he reverted the offending commit). A few other trees lost conflicts, and the tree continues to fail to build for those seeking an allyesconfig build configuration on PowerPC. The total number of sub-trees remains steady at 128 again today (apologies for missing the total in yesterday’s summary podcast).

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 22, 2009 08:45 AM

Kernel Podcast: 2009/06/18 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090618.mp3

Support for this Podcast comes from an unhealthy amount of coffee. Mine’s a double Americano, what’s yours?

For Thursday, June 18th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: The continuing 2.6.31 merge window, direct mmap for FUSE/CUSE, racing in TCP receive, problems with sys_mount(), and kernel.org front page kernels.

We’re playing catchup here, largely because this is the first merge window this podcast has had to cover and it takes, well, a certain mind set.

The Continuing 2.6.31 merge window

Dynamic per-cpu. Tejun Heo posted an updated per-cpu git tree for 2.6.31, that takes into account many of the recent per-cpu fixes (including dynamic allocation of per-cpu data). Linus objected to the tree on the grounds that it hadn’t been in linux-next, and had been created only moments before posting with (potentially) little time for test. Andrew Morten re-affirmed the lack of linux-next usage, adding ‘If this doesn’t mean “you missed 2.6.31″ then what does?” (he did also observe that there are some special cases such as this where some critical core kernel feature is modified and it’s not just “an ordinary old git merge like all the others”). The situation was clarified by Tejun: the git tree was being created from quilt patches that had been posted a number of times already, but there had been a glitch in the quilt import. He agreed that the lack of exposure in linux-next warranted delaying until 2.6.32 and stated that he would prep a tree for Stephen to pick up in linux-next soon.

Making executable pages the first class citizen. This podcast has covered this patch series several times before, but it is worth noting some feedback since this has now hit mainline, as Jesse Barnes pointed out. He found that one of his sample workloads went from creating an unusual machine to simply a slighlty sluggish machine. Fengguang Wu was happy to hear this, but keen to point out that Rik van Riel had also helped with his protecting active file LRU pages from being flushed by streaming IO. On a VM tangent, Fengguang Wu also posted in response to the ongoing HWPOISON patchset with a modified version of the “only early kill processes who installed SIGBUS handler” which only does so for processes that register an interest in doing so via a prctl. This allows applications to easily be modified, without breaking existing expectations of applications currently deployed in the field.

Fixing returng from kernel to tasks with a 16-bit stack. Alexander van Heukelum posted a detailed explanation and patch series, describing a bug in the kernel support (on x86 systems) for returning from the kernel into userspace tasks that use a 16-bit stack. Obviously, this doesn’t happen too often, but it does in emulation software such as WINE and dosemu. Due to a quirk in the manner in which an Intel processor restores state in such situations, only the lower 16 bits of the userspace stack pointer are preserved, while the upper 16 bits are kept from the kernel stack. The kernel has an existing special “espfix” segment that is abused to ensure that the upper 16 bits of the returning stack pointer will be correct, but this wasn’t always being setup correctly, especially not in a return from NMI.

Architecture updates include: microblaze (generic headers switch), and Super H fixes from Paul Mundt. On a tangent, it looks like John Williams (the author of the microblaze port has got a new .com email, possibly indicating a move)

Miscalleneous updates include: md updates from Neil Brown (including support for non-power of two chunk sizes in RAID0), ftrace updates from Steven Rostedt (including support for bypassing read locks inside the NMI handler – as you may know, Steven’s unique page swapping on read means we only need a lock on read, not on write to an active ring_buffer), a trivial documentation update to kthread_stop from Oleg Nesterov (reminding everyone that kernel threads can now call do_exit and be kthread_stop()ed, the two were previously mutually exclusive), cleanups to MAINTAINERS from Joe Perches, ext4 updates from Ted T’so, some relatively straightforward network stuff from David Miller (including wireless bits from John Linville, and bug fixes for NetXen and E100), and minimal HTC Dream Support (Google Andriod) via a reposted patch series from Brian Swetland (including some patches signed off by the somewhat quieter these days Robert Love).

Apologies to Gregory Haskins for not covering the latest iteration of his irqfd and eventfd work in detail, since it hasn’t changed hugely. But if you’d like to read about precisely how network packets are received and routed to KVM via vbus, take a look at the latest eventfd thread.

Non-merge specific concerns

Implementing direct mmap for FUSE/CUSE. Tejun Heo was busy today. In addition to posting per-cpu updates, he also posted the third version of a patchset implementing direct mmap support for FUSE/CUSE. This allows users of a FUSE filesystem to request an mmaped region, which will be satisfied on the backend by a kernel anonymous mapping, and still populated by the FUSE userspace server. The server gets to decide how mappings are shared so this has additional performance benefits for those implementing on FUSE/CUSE.

A rare race in TCP receive. Jiri Olsa posted to say that he had found a rare race in the TCP layer using a older RHEL4 kernel (that happens to be based upon 2.6.9, which is fairly long in the tooth). It turned out that, because of a missing smp_mb() and a combination of known errata in certain Intel CPUs, it was possible for tp->rcv_nxt updates made by one CPU to not propogate correctly to the others and result in a system sleeping forever. Jiri posted a patch citing the various errata, documentation, and including a fairly comprehensive analysis of the situation, although he said that he could not reproduce this upstream due to the rarity of its occurance.

Fixing an overflow in sys_mount(). Today’s tip of the hat goes to Vegard Nossum, who dilligently tracked down a bug reported by Ingo Molnar. It turns out that kernel code calling sys_mount() can be bitten by the fact that the aforementioned function will copy an entire page passed for the “type” parameter, even though less data is typically required for this string. If the content of the page happens to contain stray “wild” pointers, we might follow those and wreak some random havoc. Vegard (obviously) suggests stopping after we find the first NULL.

Finally today, Randy Dunlap resurrected an email thread from several weeks ago in which it was proposed that references to the old “mm” tree be removed from the front page of kernel.org. He added that 2.2 kernels might go the same way.

The latest kernel release is 2.6.30, which was released by Linus on June 9th.

Stephen Rothwell posted a linux-next tree for June 18th. Since Wednesday, the tree contains a few fixes, some conflicts due to deltas between Linus’ ongoing changes to his tree and developer trees, and the tree still fails to build in an allyesconfig build configuration for powerpc.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 22, 2009 06:30 AM

June 21, 2009

Evgeniy Polyakov: Xeno


Yamaha 8335 (G?)S Xeno (click to enlarge)

I played this one, 8310ZS and Bach Stradivarius 180ML43S. 8335 Xeno has a magnificent sound even with my rather limited technique. I did not think about air, lips or fingers - it just played.

Bach sounded the worst among tested models, but that's maybe because I was not able to play it (I tried standard Bach 7C and Yamaha 14C mouthpieces, my student mouthpiece is 7C), or because it was defective, which I heared is a very common problem.

Highpower@ liked light 8310Z sound more than Xeno though, but I prefered my feeling despite his professional opinion. I just love the sound Xeno produces.


It's damn perfect!

Having such instrument I must learn to play better than good, and eventually I will. As highpower@ says - 'let's listen you in two-three years'... Let's.

June 21, 2009 12:24 PM

June 20, 2009

Kernel Podcast: 2009/06/17 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090617.mp3

Support for this Podcast comes from the humble Blueberry. Did you know that a mere 4 pints of blueberries for breakfast can be a healthy form of OCD?

For Wednesday, June 17th 2009, I’m Jon Masters with a summary of the day’s LKML traffic.

In today’s issue: the continuing 2.6.31 merge window, changing the NOHZ idle load balancing logic, OpenAFS pioctls, MCE, and scsi_wait_scan configuration.

Apologies for the tardiness of today’s production. Your author is currently preparing updates to cover Thursday and the weekend podcast update and hopes to get back into the swing of things next week. I guess the merge window really is that unpleasant to keep up with – bear with me, I’ll get there. I expect to introduce more automation and tracking, and filtering, in time.

The Continuing 2.6.31 merge window

Poisonous Hardware. Fengguang Wu posted a policy change RFC patch, in which the HWPOISON code would only “early kill” (that is to say, before an unrecoverable error has occured) processes that had installed a SIGBUS handler. This would allow certain applications (that caught SIGBUS) to recover from corruption of (for example) single pages within internal caches and other non-critical (isolatable) data. This might include, for example, the KVM (Kernel Virtual Machine) Hypervisor, Oracle’s database software, or similar programs using extensive internal cacheing to recover on memory errors.

Early SLAB allocation. Pekka J Enberg posted a series of SLAB updates for 2.6.31, which remember, include the new early SLAB allocator approach. In a separate mail thread, Linus Torvalds suggested that “All the recent init ordering changes should mean that the slab allocator is available _much_ earlier – to the point that hopefully any code that runs before slab is initialized should know very deep down that it’s special, and uses the bootmem allocator without doing any conditions what-so-ever”. Ben Herrenschmidt (the maintainer of the PowerPC architecture port) reponded that, which he would normally agree with this, there are a number of hairy skeletons in the PowerPC port closet that prevent this from being true…yet. He pleaded for more time before things like slab_is_available() are taken away from him, and he’s probably not the only person who will be affected in such a migration.

e820 table reservations. e820 is a standard BIOS extension used by a PC-based Operating System, such as Linux, to query the system physical memory map, for example to determine where certain standard resources are located. The existing e820 parser in the kernel doesn’t handle regions marked as EFI_RESERVED_TYPE, so they might be recorded as useable. A patch from Cliff Wichman changes this by marking such regions as E820_RESERVED.

Searching for empty slots in resources trees. In PCI, we use BARs (Base Address Registers) to program devices with a range of the system (PCI) address space to use for interaction with the host system. For example, a card providing a large buffer needs to have that buffer mapped somewhere in memory. Andrew Patterson noticed that the function pci_assign_resource() which calls find_resource, and is used to allocate address ranges for PCI device BARs in the parent bridge’s resource tree during hot add operations only checks is immediate children and siblings of the root resource passed. In certain topologies where a resource (that is to say, range of memory) is only available further down the resource tree, the existing algorithms can fail to allocate an acceptable resource. Andrew posted a patch that modifies find_resources and allocate_resources so that they recursive descend the entire tree instead. Others (including Linus Torvalds) expressed some concern that Andrew’s patch might be curing symptoms rather than the actual disease, since the situation described shouldn’t easily be arising. Later, Matthew (willy) Wilcox posted a series of four patches covering this problem, fixing it by “changing where ia64 sets up the resource pointers in the root pci bus”.

Dynamic per-cpu. Tejun Heo posted version 3 of his dynamic per-cpu patchset. Per-CPU is a mechanism wherein Linux kernel code can split certain data into a data area per CPU, so that hot-path code can quickly make updates without being concerned about the actions of other CPUs. Like it sounds, this patchset makes per-cpu data area allocations entirely dynamic, rather than a compile-time determination. At David Miller’s request, individual maintainers were removed from the CC list and substituted with the more generic arch maintainers list. Separately, Tejun posted a patch (entitled “teach lpage allocator about NUMA) which “makes the percpu allocator able to use non-linear and/or sparse cpu -> unit mappings and then makes the lpage allocator consider CPU topology and group CPUs in LOCAL_DISTANCE into the same large pages”.

VFS patches, part 2. Al Viro posted a series of VFS patches, mostly targeting BKL (Big Kernel Lock) removal in both the VFS and in filesystems. The Big Kernel Lock (BKL) was introduced in the easiest days of Linux SMP support written by Alan Cox as a means to have an extremely coarse-level “kernel lock” (exactly one CPU could be executing kernel code at a time), but it has long since become a performance bottleneck and is slowly being removed. Previous kernels have attempted to replace it with a semaphore (which was reverted, again for performance related reasons), and the RT tree still does so. Separately, Jan Blunck posted a series of patches preparing for the VFS based union mounts. He and Val think these are good to go in separately.

PCI updates for 2.6.31. Jesse Barnes posted a summary of pending changes in his git tree. These include improved PCI AER (Advanced Error Reporting) support (refer to the pciaer-howto for further information), the removal of pci_find_slot, and a collection of the usual cleanups and fixes.

FireWire updates post 2.6.30. Stefan Richter posted a few IEEE1394 (firewire) updates for 2.6.31. These included the newer sysfs attributes mentioned previously that should lead to “simpler and saner udev rules”.

Miscellaneous updates include: some trivial fixes for the ksym_tracer from K. Prasad, V4L/DVB updates from Mauro Carvalho Chehab, kmemleak fixes from Catalin Marinas (who also wishes to rename kmemleak_panic to kmemleak_stop to avoid confusion over the use of the “panic” word), UBI and UBIFS fixes from Artem Bityutskiy, some exofs patches from Boaz Harrosh, and a patch series adding software (not hardware) counters for PowerPC 32-bit. Discussion continued on the idea of handling page faults on x86 with interrupts enabled, adding a little complexity to the interrupt handler but intending to reduce overall overhead in the process.

Non-merge specific concerns

Changing the NOHZ idle load balance logic. Venkatest Pallipadi posted a two part patch series aimed at changing the NOHZ idle load balance logic from the “pull” model currenly in use (in which one idle load balancer CPU is nominated to not go into NOHZ mode and ends up doing all the balancing work for CPUs in the NOHZ mode) to a “push” model in which busy CPUs can kick those that are idle (and in NOHZ mode) into taking care of idle balancing on behalf of a group of idle CPUs. Apparently, there are still some “rough edges”, and so this is an RFC for the moment.

OpenAFS pioctls. OpenAFS is an implementation of the Andrew distributed filesystem, which is especially popular with banks and international corporations. David Howells posted a 17 part patch series implementing an in-kernel pioctl system call, as used by OpenAFS. Alan Cox objected to the “ugly” nature of the ABI, and asked why David couldn’t instead use the C-library system call wrapper (all system calls end up with a small wrapper in the system C-library) to do what this system call would otherwise do using those already available. David replied that it was almost possible to do this, but that it got very hairy and that he also wanted the kAFS and OpenAFS implementations to be able to share userspace tools without recompiling.

MCE test coverage data. Huang Ying posted to let everyone know about his mce-inject test tool (with git repostitory) and about further test information being available on his kernel.org people page.

Finally today, the “lack” of a configuration option for scsi_wait_scan was finally addressed today in the form of documentation (from Stefan Richter) explaining why it has intentionally been ommited. Thee SCSI wait scan module is used (especially by distributions, in their initrds) in order to wait for SCSI device enumeration activity completion. It does this by simply not returning from module_init until the SCSI subsubsystem is ready to procede. It is needed by some users and accidental removal can lead to hard to debug boot failures, although removing the config option does seem excessive.

In today’s announcements: Thomas Gleixner announced version 2.6.29.5-rt21 of the Real Time patchset. The latest version includes a fix for a rather unpleasant “lockup” scenario in the softirq handling code. There was no announcement for the previous -rt20 release due to this softirq issue.

The latest kernel release is 2.6.30, which was released by Linus June 9th.

Stephen Rothwell posted a linux-next tree for June 17th. Since the previous day, the powerpc tree continues to fail to build in an allyesconfig build configuration, the ext4 build failure means that a version from Monday is being used, the 4vl-dvb tree lost its conflict, and the KVM tree gained a build failure (due to PowerPC now using -Werror), for which Stephen applied a quick patch. Total tree count remains at 128 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 20, 2009 06:12 PM

Harald Welte: Palm Pre wanted for FOSS hackers

A number of people from the various community-based Linux mobile phone projects (OpenEZX, gnufiish, freesmartphone.org, openmoko, openembedded) are interested in adopting the Palm Pre into the portfolio of supported devices.

If anyone wants to support those communities with Palm Pre hardware, please let me know. Right now, all the people I know are in Europe. Yes, we don't have CDMA hare - but those hackers don't care. All they want is to make sure you can build a number of different distributions on the application processor, and to support everything _but_ the CDMA modem in preparation for the GSM variant that is to be released at some future point.

June 20, 2009 02:00 AM

Harald Welte: ScummVM settles GPL duspute with Mistic software

As you can see from this press release, ScummVM alleged Mistic Software and its distributors from infringing the GNU GPL in some proprietary games based on ScummVM.

As it seems, this case was now settled. The press release does not make any statement on how the actual GPL issues were solved (i.e. "where is the source code"), but I would assume they would not want to settle unless the conditions of the GPL are fulfilled...

If anyone has more information, I'm interested to learn about that.

June 20, 2009 02:00 AM

June 19, 2009

Evgeniy Polyakov: A zen of climbing

It is friday and climbing training again. This time it was rather simple one, I even managed to make a chin-up exercises at the end on the hold on the negative slope (as on campus-board).
But initially I climbed over couple of new traces and repeated 3 runs of one-after-anoter 6c trace (well, without its complex start on the negative horizontal slope, so its is a cheat). One of the new traces was something about 6a (maybe even 5c) on the vertical wall - and was finished on-sight.
Another trace is something really complex, like 7a, but again I skipped complex negative horizontal balcony and fell in the middle (but continued) and at the very end (where I still do not know how to move). I will try different negative horizontal starts as warming/beginning of the trainig - when I'm quite fresh and if exercise does not take a long time, I will quickly recover.

Expect a good progress since believe that I'm in a really good shape now.

Climbing is a freaking cool sport - everything strains, aches, relaxes and it just fires the roof!

June 19, 2009 09:51 PM

Dave Jones: Fedora 11 released, onwards to F12.

With the release of Fedora 11 recently, we are now back in the position of looking onwards to the next release. F11 released with a 2.6.29 kernel, and we’re already looking at doing a .30 rebase for it soon. (I was hesitant to type that, because the last time I blogged about doing a rebase, we hit some troubles and ended up skipping the 2.6.28 release for F10). While F11 was stabilising before release, devel/ had continued on, being rebased up to 2.6.30. Yesterday Kyle committed the the first rebase to get us back up to Linus’ tree of the day.

So we’re looking at 2.6.31 for F12. (With various conferences coming up over the next few months, it seems infeasible that .32 will land in time for F12’s release).

Other planned changes? We’ve talked about dropping the exec shield patch that we’ve been carrying since Fedora Core 1.
It’s a pain to have to keep carrying it, and rebasing it, and occasionally fixing it, just to add a poor emulation of a feature that has been in all CPUs for the last five years. The decision isn’t final yet, but it’s something that’s being considered.

Some of the other patches we’ve been carrying (like modesetting drm) are now starting to find their way upstream too, which is obviously a good thing. The only really big thing we’re still carrying that struggles to get upstream is utrace.

Post from: codemonkey.org.uk

No related posts.

June 19, 2009 03:36 PM

Evgeniy Polyakov: Advanced merge conflict resolver has been merged

Elliptics network is now able to work in a network split setup and system will merge different history logs when node rejoins.

There are 5 different strategies (described from joining node's point of view):

The first two are obvious - we discard either remote or local history in favour of the selected version. All uptadates stored in the discarded log will be lost. The next two versions are also quite simple - we find a common ancestor and then apply the rest of the selected history log on top of the base one. In this case the first part of the merged log will contain either full remote or local history (depending on the selected strategy) and the second part will contain the rest of the other history.

Here is an example. That's how original history looks on both servers:

$ ./example/dnet_hparser -f /tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history 
/tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history: 
  objects: 5, range: 0-0, counting from the most recent (nanoseconds resolution).
2009-06-19 17:57:43.470096000: 32ea6116: flags: 00000000, offset:     2048, size:     2048: -
2009-06-19 17:57:40.814129000: af24ab16: flags: 00000000, offset:     8192, size:     4096: -
2009-06-19 17:57:40.656900000: 0054b454: flags: 00000000, offset:     4096, size:     4096: -
2009-06-19 17:57:40.503311000: 6cdd3cb9: flags: 00000000, offset:        0, size:     4096: -
2009-06-19 17:57:40.540129000: 6cdd3cb9: flags: 00000000, offset:        0, size:    12288: -
$ ./example/dnet_hparser -f /tmp/elliptics-test/root0/ff/ff00000000000000000000000000000000000000.history 
/tmp/elliptics-test/root0/ff/ff00000000000000000000000000000000000000.history: 
  objects: 5, range: 0-0, counting from the most recent (nanoseconds resolution).
2009-06-19 17:57:42.163033000: 51b89998: flags: 00000000, offset:     1024, size:     3072: -
2009-06-19 17:57:40.814129000: af24ab16: flags: 00000000, offset:     8192, size:     4096: -
2009-06-19 17:57:40.656900000: 0054b454: flags: 00000000, offset:     4096, size:     4096: -
2009-06-19 17:57:40.503311000: 6cdd3cb9: flags: 00000000, offset:        0, size:     4096: -
2009-06-19 17:57:40.540129000: 6cdd3cb9: flags: 00000000, offset:        0, size:    12288: -

Notice that coloured lines (also italic) are different and should be merged. That's how this will look after one of the merge strategies applied:

$ ./example/dnet_hparser -f /tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history 
/tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history:
  objects: 6, range: 0-0, counting from the most recent (nanoseconds resolution).
2009-06-19 17:57:43.470096000: 32ea6116: flags: 00000000, offset:     2048, size:     2048: -
2009-06-19 17:57:42.163033000: 51b89998: flags: 00000000, offset:     1024, size:     3072: -
2009-06-19 17:57:40.814129000: af24ab16: flags: 00000000, offset:     8192, size:     4096: -
2009-06-19 17:57:40.656900000: 0054b454: flags: 00000000, offset:     4096, size:     4096: -
2009-06-19 17:57:40.503311000: 6cdd3cb9: flags: 00000000, offset:        0, size:     4096: -
2009-06-19 17:57:40.540129000: 6cdd3cb9: flags: 00000000, offset:        0, size:    12288: -

or

$ ./example/dnet_hparser -f /tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history 
/tmp/elliptics-test/root1/ff/ff00000000000000000000000000000000000000.history:
  objects: 6, range: 0-0, counting from the most recent (nanoseconds resolution).
2009-06-19 17:57:42.163033000: 51b89998: flags: 00000000, offset:     1024, size:     3072: -
2009-06-19 17:57:43.470096000: 32ea6116: flags: 00000000, offset:     2048, size:     2048: -
2009-06-19 17:57:40.814129000: af24ab16: flags: 00000000, offset:     8192, size:     4096: -
2009-06-19 17:57:40.656900000: 0054b454: flags: 00000000, offset:     4096, size:     4096: -
2009-06-19 17:57:40.503311000: 6cdd3cb9: flags: 00000000, offset:        0, size:     4096: -
2009-06-19 17:57:40.540129000: 6cdd3cb9: flags: 00000000, offset:        0, size:    12288: -

The same history will be placed both on local and remote nodes.

The last merge strategy - DNET_MERGE_FAIL (4) will fail with the following lines in the log:

2009-06-19 18:03:27.354246 2: ff000000: histories do not match and fail strategy was selected.
2009-06-19 18:03:27.354485 8: ff000000: failed to merge histories, err: -22.

Automatic test for this functionality (written in bash) is comparable in size with the feature itself.

There are only two issues left to implement in the elliptics network to be considered complete.
One of them is a known problem when joining node does not advertise objects it stores which are outside of the specified range, while it should merge them into the network. Although solution exists, it was not yet tested in this particular case - it is exactly the same as syncing failed range to the neighbour node.
Second problem is unlink support - objects have to maintain reference counter for the finer-grained deletion. Since transaction with the same content ends up in the same object thus performing automatic data deduplication, we can not simply delete it if it is referenced by two or more objects outside. Reference counting allows to remove object only when it is not referenced by any other object. Given that each low-level transaction (i.e. that one which contains some data and not a history update) has a history of the objects it is referenced from, deletion should not be a major problem.

After those tasks are finished I consider project as completed and it will be moved into bug fixing mode. To date I do not see any other features needed to be implemented in the core library (but I do remember about PAXOS-based locking for the backed up histories).

So plan to have a small rest after it and work on regext state machine implementation, LR grammatics and knowledge extraction. A small and rather simple AI bot should be developed for some mail lists this year, and I expect a lot of fun working with it.

In a week or so I will start porting POHMELFS to the elliptics network.

June 19, 2009 02:18 PM

June 18, 2009

Jens Axboe: Cheaper SSD reliability?

In earlier blog entries, I praised the Intel X25-E for its performance. I also have high hopes for the reliability of the drive. By reliability, I refer to data integrity as well as endurance. It's not that I have much information to back this up, but I know that Intel have put a lot of testing into them.

So while the X25-E is extremely nice, it's also very pricey. Recently I needed a few more SSD's for testing purposes, and despite public begging on this blog, Intel hasn't sent me any more drives. As I'm sure most are well aware, the cheaper SSD drives were mostly utter crap. Even most expensive SSD drives have been crap, mostly due to using that infamous JMicron flash controller that would have done more good as nice sand on the beach instead of being manufactured into silicon. Now there are other alternatives though, and the Indilinx controller looked like a good option. OCZ recently introduced a Vertex series that uses this controller. Not only does it perform better, it's also not 80s ATA tech. It has NCQ and TRIM support, which is very nice.

I went out and bought a few of these for testing. One I put in the laptop and the other in a test box. Performance is good, even random 4kb writes actually work. This is where the crap SSD's fall apart. However, as opposed to the Intel drive, I didn't have a lot of faith in the reliability of these drives. Early firmwares were plagued with errors, and even the just released v1.30 firmware fixes issues that seem like rather basic functionality. An example of that would be mishandling ATA commands with zero sector count. But I decided to give them the benefit of the doubt.

A few days ago, I was working on the laptop at night as usual. Pushing out a few changes from my block git repository, git complained of a corrupt pack file. The pack file in question was from when I lasted repacked the repo back in February. It's read-only, about 380MB in size, and thus hasn't been written to since it was created some 4 months ago. I usually don't keep backups of my laptop data, since it's just a development environment and all my source is safe with git on a public server. As it just so happened, I had tested the new btrfs format a week earlier. In doing so, Chris asked me to keep an image of the drive so we could debug any potential problems with the new format. So I went and fetched the pack file from the backup and compared the two. The backup file was, as expected, fine. Looking into the nature of the corruption (basically finding out who to blame for the corruption), I found out that the corruption started 64519680 bytes into the file. So that's nicely 512b aligned, but not 4kb aligned. The corruption spanned 16KB in total. So far, so good. What I found out next was even more interesting: every other byte in file was correct, every other byte was 0xff!

That type of corruption just reeks of drive problems. I reported this issue to OCZ about a week ago. First level support quickly replied and passed the issue on to the engineers, but I have yet to hear anything from that side. I've kept the drive as-is if they want to inspect it. I'm not keeping my hopes up though, and I'm glad I'm not the OCZ engineer tasked with fixing it.

Meanwhile I put the other Vertex in the laptop and recreated my git tree. No issues seen so far, but suffice to say that my confidence level in these drives aren't that high. I'll be keeping backups if I put anything interesting on the laptop!

June 18, 2009 10:28 AM

Kernel Podcast: 2009/06/16 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090616.mp3

Correction: Due to an editing error, the June 16th edition of the LKML Podcast incorrectly stated that Pekka Enberg was the driving force behind a push for GFP_BOOT. In fact, Nick Piggin is the primary push behind that, while Pekka has stated several times that he is in fact comfortable with either approach.

For Tuesday, June 16th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: the continuing 2.6.31 merge window, bulk CPU hotplug, and interrupts during pagefault.

The Continuing 2.6.31 merge window

Kernel development times. Greg Kroah-Hartman and Luis R. Rodriguez had an exchange of emails concerning Luis’ new rc-series and merge window docs. Greg questioned Luis’ figures for previous kernel release dates and development times, and Luis ultimately accepted Greg’s version of events. In Greg’s figures, over the last 10 kernel cycles, the minimum development time between kernels was 68 days (2.6.20) and the maximum was 108 (2.6.24). This places the next kernel release sometime in the early days of September.

Early SLAB allocation. Nick Piggin and Ben Herrenchmidt continued a debate between themselves (with occasional others chipping in) concerning whether it was appropriate to introduce special “boot time” versions of the kmalloc and vmalloc function calls (or more specifically, adding special boot time GFP_ flags that should be passed if an allocation might take place early in boot). Ben Herrenschmidt pointed out that there are many points at which allocations might happen and we wouldn’t think to use special flags – for example, even during suspend/resume one might be trying to perform a memory allocation that blocks pending IO to a disk that has already since gone offline. Ben pointed out that, in such cases, it’s far more likely to work out for the best if infrastructure components automatically degrade such that (for example) kmalloc automatically uses GFP_NOIO once suspend has started.

USB. Greg Kroah-Hartman posted a large number of updates via his git tree and requested Linus merge. Amongst the updates were USB 3.0 support (see Sarah Sharp’s blog posting for the details), various new drivers, Unicode bugfixes, power management, and core code cleanups. There were a few non-USB related patches that the tree depends upon but these had all received the blessings of those subsystems affected. Greg also posted a series of driver core patches – most of which were minor in nature – and these included API cleanups and documentation.

Btrfs. Chris Mason followed up again concerning changes to the physical on-disk format for Btrfs, noting that newer kernels (those post 2.6.30) will roll forward existing filesystems to a format not supported by older kernels. In order to help developers who might be using Btrfs, Chris posted some rescue disk images based upon the Arch Linux 2.6.30 distro to his kernel.org pages. These contain enough filesystem checking tools to repair damage, as well as git, gcc, make, and enough to compile a kernel. Separately the Fedora folks posted to fedora-devel announcing that rawhide will be picking up the format change in due course, and reminding everyone that breakage is entirely possible, and that Btrfs won’t be ready for prime time for a year yet. Also on the filesystem front came some minor updates for OCFS2 from Joel Becker (although he noted that these were almost entirely fixes), and David Howells posted some updates to the AFS filesystem support code.

Kdump crashkernel breakage. Chris Wright pointed out that a recent change to CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN will impact those who follow the documentation to use their (relocatable) kdump crashkernel loaded with a 64MB or 128MB window at 16MB. Doing so will now interfere with the stock kernel because it has moved from the old default physical start of 2MB. Chris suggests that the problem here is that the documentation needs to be updated reflecting this change since 2.6.30, but sought input.

Adding formatting to WARN(). Linus Torvalds and Ingo Molnar debated Arjan van de Ven’s idea of adding “\n” formatting to the WARN() macro, for the ease of formatting in kernel log files (and less corruption to logs posted on kerneloops.org). Linus liked the idea as much as Ingo, but he felt that blanketly applying formatting to all users would adversely affect existing “naked” printk’s at this point, and he didn’t much like the idea of forcing those users to migrate to using KERN_CONT explicitly. So, in true Linus style, Linus wrote a cunning macro that tries to do the right thing, only adding a “\n” if a KERN_xyz level is included at the start of the string, and changing the implementation of KERN_CONT so that it can still be used for continuation.

On a related note, Mike Frysinger posted an RFC patch series implementing a series of useful new functions for printk()ing during initcalls. Rather than simply using printk() directly, these wrappers – which include, for example, pr_info_init() and pr_cont_init() (for printk continuation) – cause the accompanying string to be stored in a separate ELF section of the kernel linked binary image(s), so they can be unloaded aswell as the initdata.

Also on the printk() front, Dave Young posted a generic version of the previous printk delay implementation for use during normal system operation. So, Linux now has an ability to insert delays between printk() messages on boot, on halt, and during normal operation with the use of a sysctl. This is specifically intended for certain kinds of embedded (and also similar) systems where it might not be easy to capture kernel output without a delay insertion.

Architecture updates include: Power management updates for s390, Blackfin, and SPARC. The latter gained dynamic per-cpu allocator support, and a new syscall. Jeremy Fitzhardinge posted some minor io_apic cleanups for x86 which he had noticed while pursuing his Xen work, these included further 32/64-bit merge fallout, loop restructuring, and comment fixing.

Non-merge specific concerns

Bulk CPU Hotplug support. Gautham R Shenoy posted an RFD patch series aimed at opening discussion surrounding the best way to move forward from the current CPU hotplug implementation. The current code allows one to online and offline a single “CPU” at a time, but this “CPU” might in fact be part of a multi-core processor or even larger package, where performing a whole series of CPU Hotplug events to take down the package is much slower than need be. Gautham posted some benchmarks (for PPC64 systems) and a fairly detailed proposal in which one could echo comma separated lists of CPUs to online or offline as a unit via the /sys/devices/system/cpu/online and /sys/devices/system/cpu/offline sysfs entries.

Interrupts during page fault (to trap or not to trap?). As part of a thread entitled “perf_count: x86: Fix call-chain support to use NMI-safe methods”, Ingo Molnar, Mathieu Desnoyers, and others engaged in a lively discussion surrounding the overhead of disabling interrupts during page faults and re-enabling them afterward (an cli/sti cycle doesn’t come free). Currently, Linux uses x86 architecture “interrupt gates” rather than “trap gates” in order to ensure interrupts are disabled starting from the moment that a page fault condition is generated. This is in order to prevent the Intel archictectural “CR2″ control register from being “messed up” by other subsequent interrupts. But if this register state is saved on the kernel within the IRQ handler instead, then the overhead (in this case of a special purpose register – SPR – write) is moved from the page fault handler having to disable/enable interrupts into the interrupt handler, which will now have to write to CR2 under certain circumstances. Ingo performed various benchmarks and agreed with Mathieu that this was an overall win due to the order of magnitude more page faults than interrupts likely on a typical x86 system.

In today’s announcements: lio-utils v3.0 configfs HOWTO for v2.6.30. Nicholas A. Bellinger announced a new HOWTO for Linux-iSCSI.org Target v3.0 users.

The latest kernel release is 2.6.30, which was released by Linus last Tuesday.

Stephen Rothwell posted a linux-next tree for June 16th. Since Monday, the kmemleak tree was removed (since it had served its purpose of testing the newer kmemleak patches), the tree continues to fail to build in an allyesconfig powerpc build time configuration, and a large number of other trees lost conflicts as the merge process continues. The total tree count is now down to 128 sub-trees, with the removal of kmemleak contributing to that.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

June 18, 2009 09:41 AM

Harald Welte: Palm has released sources for WebOS 1.0.1

On this page, Palm has started to release source code + patches for a number of FOSS programs that they use in the Pre. I suppose the page is only an interim solution, since the entire site (nor the page URL) doesn't yet really seem to consider the fact of OS updates, etc.

Of course I have no idea yet if those sources can be considered complete and corresponding, but at least an initial look seems quite promising.

I've spent about 10 minutes looking at their 9 MByte (!) kernel patch against vanilla 2.6.24. The modem interface seems to be a UART + USB. The UART is required for stuff like waking up the OMAP3 from the baseband, and then you use it to set up a USB connection to the modem, where a hacked/extended version of the cdc-acm driver appears to be used.

I don't have time to look further into it, but I'm sure somebody with actual device hardware will - now that the source is out there.

June 18, 2009 02:00 AM

June 17, 2009

Valerie Aurora: Chunkfs summary on LWN

"Whatever happened to chunkfs?" Y'all can stop asking me now...

LWN article on chunkfs (the usual, pay-only for the first week, really teeny payment)

If $5/month is beyond your means, all the source data is here:

This article is by no means a research paper and I feel that keenly, but the ennui is so strong that this is the best I could do. I gave a chunkfs talk at a friend's company yesterday and I could barely bring myself to finish the slides.

Oh, yes, please consider designing your storage system for fast check and repair (see papers above for ideas). Your system will get corrupted, you will have to fix it, and you will thank yourself for thinking about in advance. :)

June 17, 2009 04:43 PM

Linus Torvalds: Outwitting the fashion police

This is a public service announcement for all geeks.

Are you tired of people pointing out that you shouldn't use socks and sandals? I know, it really annoyed me too. It's like they are trying to take away your geek card.

But there's a solution.

For a year now, I've been avoiding the fashion police by instead of "sandals" wearing "shoes with holes in their sides". I've got these Keen's that look enough like shoes that nobody ever bats an eye at you wearing them with socks (Ok, by "nobody", I mean my wife, but that's all that matters, right?).

The problem is that it looks like the fashion police may be starting to figure it out. The model I have seems to be no longer in production, and now all the new ones I find are pretty obviously sandals (toes and/or heel showing).

So when I wear out my current ones, I'm going to be in trouble again. Damn.

June 17, 2009 04:33 PM