Kernel Planet
June 20, 2013
Continued work tracking down the mysterious soft-lockup case I seem to be triggering with ease. Thought at one point I’d nailed it after building a kernel with a recent RCU patch reverted. But it just took longer to hit.
Set up a second system with the same hardware just to rule out some kind of weird hardware bug. Reproduced the bug on that instantly.
Upside: Now I have two test machines. Downside: Still no closer to figuring out what the hell is causing the bug.
Tomorrow involves git-bisect.
Daily log June 19th 2013 is a post from: codemonkey.org.uk
June 20, 2013 04:36 AM
After nearly 3 years in on-again/off-again development, MirrorManager 1.4 is now live in the Fedora Infrastructure, happily serving mirrorlists to yum, and directing Fedora users to their favorite ISOs – just in time for the Fedora 19 freeze.
Kudos go out to Kevin Fenzi, Seth Vidal, Stephen Smoogen, Toshio Kuratomi, Pierre-Yves Chivon, Patrick Uiterwijk, Adrian Reber, and Johan Cwiklinski for their assistance in making this happen. Special thanks to Seth for moving the mirrorlist-serving processes to their own servers where they can’t harm other FI applications, and to Smooge, Kevin and Patrick, who gave up a lot of their Father’s Day weekend (both days and nights) to help find and fix latent bugs uncovered in production.
What does this bring the average Fedora user? Not a lot… More stability – fewer failures with yum retrieving the mirror lists, not that there were many, but it was nonzero. A list of public mirrors where the versions are sorted in numerical order.
What does this bring to a Fedora mirror administrator? A few new tricks:
- Mirror admins have been able to specify their own Autonomous System Number for several years. Clients on the same AS get directed to that mirror. MM 1.4 adds the ability for mirror admins to request additional “peer ASNs” – particularly helpful for mirrors located at a peering point (say, Hawaii), where listing lots of netblocks instead is unwieldy. As this has the potential to be slightly dangerous (no, you can’t request ALL ASNs be sent your way), ask a Fedora sysadmin if you want to use this new feature – we can help you.
- Multiple mirrors claiming the same netblock, or overlapping netblocks, were returned to clients in random order. Now they will be returned in ascending netblock size order. This lets an organization that has a private mirror, and their upstream ISP, both have a mirror, and most requests will be sent to the private mirror first, falling back to the ISP’s mirror. This should save some bandwidth for the organization.
- If you provide rsync URLs, You’ll see reduced load from the MM crawler as it will now use rsync to retrieve your content listing, rather than a ton of HTTP or FTP requests.
What does this bring Fedora Infrastructure (or anyone else running MirrorManager)?
- reduced memory usage in the mirrorlist servers. Especially with as bad as python is at memory management on x86_64 (e.g. reading in a 12MB pickle file blows out memory usage from 4MB to 120MB), this is critical. This directly impacts the number of simultaneous users that can be served, the response latency, and the CPU overhead too – it’s a win-win-win-win.
- An improved admin interface – getting rid of hand-coded pages that looked like they could have been served by BBS software on my Commodore 64 – for something modern, more usable, and less error prone.
- Code specifically intended for use by Debian/Ubuntu and CentOS communities, should they decide to use MM in the future.
- A new method to upgrade database schemas – saner than SQLObject’s method. This should make me less scared to make schema changes in the future to support new features. (yes, we’re still using SQLObject – if it’s not completely broken, don’t fix it…)
- Map generation moved to a separate subpackage, to avoid the dependency on 165MB of python-basemap and python-basemap-data packages on all servers.
MM 1.4 is a good step forward, and hopefully I’ve laid the groundwork to make it easier to improve in the future. I’m excited that more of the Fedora Infrastructure team has learned (the hard way) the internals of MM, so I’ll have additional help going forward too.
June 20, 2013 01:39 AM
June 19, 2013
Mir is Canonical's equivalent to Wayland - a display server, responsible for getting application pixmaps onto a screen. It's intended to scale from mobile devices to the desktop, and as such is expected to turn up in Ubuntu Phone before too long[1]. There's already plenty of discussion about whether the technical differences between Wayland and Mir are sufficient to justify Canonical going their own way, so I'm not planning on talking about that.
Like many Canonical-led projects, Mir is under GPLv3 - a strong copyleft license. There's a couple of aspects of GPLv3 that are intended to protect users from being unable to make use of the rights that the license grants them. The first is that if GPLv3 code is shipped as part of a user product, it must be possible for the user to replace that GPLv3 code. That's a problem if your device is intended to be locked down enough that it can only run vendor code. The second is that it grants an explicit patent license to downstream recipients, permitting them to make use of those patents in derivative works.
One of the consequences of these obligations is that companies whose business models depend on either selling locked-down devices or licensing patents tend to be fairly reluctant to ship GPLv3 software. In effect, this is GPLv3 acting entirely as intended - unless you're willing to guarantee that a user can exercise the freedoms defined by the free software definition, you don't get to ship GPLv3 material. Some companies have decided that shipping GPLv3 code would be more expensive than either improving existing code under a more liberal license or writing new code from scratch. Android's a pretty great example of this - it contains no GPLv3 code, and even GPLv2 code (outside the kernel) is kept to a minimum.
Which, given Canonical's focus on pushing Ubuntu into GPLv3-hostile markets, makes the choice of GPLv3 an odd one. This isn't a problem as long as they're the sole copyright holder, because the copyright holder is obviously free to ship their code under as many licenses as they want. But Canonical still aim to foster community involvement, and ideally that includes accepting external contributions to their code. If Canonical simply accepted those contributions under GPLv3 then they'd no longer have the right to relicense the entire codebase, so any contributions are only accepted if the contributor has signed a Contributor License Agreement.
Canonical's CLA is pretty simple. In essence, it grants Canonical the right to use, modify and distribute your code, and it grants Canonical a patent license under any patents you own that may cover the code in question. But, most importantly, it grants Canonical the right to relicense your contribution under their choice of license. This means that, despite not being the sole copyright holder, Canonical are free to relicense your code under a proprietary license.
Given Canonical's market goals, this makes sense. They can relicense Mir (and any other GPLv3 projects they own) under licenses that keep their hardware partners happy, and they can ship in the phone market. Everyone's a winner.
Except, if Canonical want to ship proprietary versions, why not just license Mir under a license that permits that in the first place? This is where the asymmetry comes in. The Android userland is released under a permissive license that allows anyone to take Google's code, modify it as they wish and ship it on whatever hardware they want. I could legally start a company that provided customised versions of Android to phone vendors without them having any GPLv3 concerns. I won't be able to do that with Ubuntu Phone.
I'm a fan of GPLv3. I think the provisions it contains to support user freedom are important. I hate the growing trend of using free software to build devices that are, effectively, impossible for the end user to modify. If Canonical were releasing software under GPLv3 because of a commitment to free software then that would be an amazing thing. But it's pretty much impossible to square the CLA's requirement that contributors grant Canonical the right to ship under a proprietary license with a commitment to free software. Instead you end up with a situation that looks awfully like Canonical wanting to squash competition by making it impossible for anyone else to sell modified versions of Canonical's software in the same market.
Canonical aren't doing anything illegal or immoral here. They're free to run their projects in any way they choose. But retaining the right to produce proprietary versions of external contributions without granting equivalent reciprocal rights isn't consistent with caring about free software or contributing to the wider Linux community, especially if it means you get to exclude those external contributors from the market you're selling their code into.
(Edit to add: a friend in the contracting industry points out that it also prevents vendors who won't ship GPLv3 from using external contractors to work on Mir - they have to go to Canonical, because only Canonical can relicense contributions under a proprietary license.)
[1] Right now Ubuntu Phone is using Surfaceflinger, the Android display server, but that's apparently just an interim solution.
comments
June 19, 2013 10:50 PM
Lots of hitting already-found bugs today (sync() lockups, btrfs lockdep report, btrfs warnings).
Things are bad enough I’m having to avoid running certain things during testing.
Spent some time looking through various upstream trees to ascertain just how much stuff is fixed, and pending merge. Results inconclusive, but I get the feeling a lot of people are waiting for 3.10 before merging, which is annoying.
Rebuilt some older test machines to get some 32-bit in the mix. Hit a bug straight away. Again, one that has already been reported a month ago. Nngh.
Started paving the way for merging some of my linked-list debugging improvements, starting with some cleanups to get rid of __list_for_each, now that it’s redundant. (It’s identical to the regular list_for_each() now that the prefetching has been removed).
Also started prepping some of the other debug patches I’ve been sitting on for a while.
Daily log June 18th 2013 is a post from: codemonkey.org.uk
June 19, 2013 06:07 AM
June 18, 2013
[Note, if you're looking for an alternative, you might try VigLink; I'm giving that a shot now to see how it goes.]
Well, it happened. First and foremost, I’ve always tried to make my blog interesting to readers interested in technology & energy, and in the process I’ve sometimes linked out to relevent products on Amazon, to make me a little beer money. I’ve tried not to be too annoying or gratuitous about it, but it did help a little to offset the ISP charges etc. But today I got this email:
We are writing from the Amazon Associates Program to notify you that your Associates account will be closed and your Amazon Services LLC Associates Program Operating Agreement will be terminated effective June 30, 2013. This is a direct result of the unconstitutional Minnesota state tax collection legislation passed by the state legislature and signed by Governor Dayton on May 23, 2013, with an effective date of July 1, 2013. As a result, we will no longer pay any advertising fees for customers referred to an Amazon Site after June 30 nor will we accept new applications for the Associates Program from Minnesota residents.
As near as I can tell, Amazon has neatly evaded the law, which added:
(b) A retailer is presumed to have a solicitor in this state if it enters into an agreement with a resident under which the resident, for a commission or other substantially similar consideration, directly or indirectly refers potential customers, whether by a link on an Internet Web site, or otherwise, to the seller.
So: Chuck out all the affiliates, collect no tax, done and done. The state is no better off, and the bloggers in the state are worse off. This is exactly what has happened in other states, so it should come as no surprise to our esteemed legislators. I get it that states are hurting from dropping sales tax from brick and mortar stores and are looking for solutions, but it should have been obvious to anyone paying attention that this law would have very little effect when it’s this simple for places like Amazon to avoid it.
I was tempted to purge all links to Amazon from the blog – why send my good readers there for free? ;) But going forward, I guess I’ll try VigLink, which is sort of an affiliate of affiliates, and seems immune from this kind of thing, at least for now. It looks trivial to switch over to w/o needing to go fix up any existing articles. Hopefully it won’t make me look too craven; I’ll fine tune it as we go along.
June 18, 2013 01:52 PM
June 14, 2013
- Switched out a broken SSD in test machine for a hybrid drive. Hopefully it’ll last longer even if it is slightly slower.
- Hit a trinity bug where a child process would sleep for a really long time, and the watchdog had already exited. Reworked things so the watchdog never leaves while there child processes still running so that it can SIGKILL ‘stuck’ children. Stupid bug..
- Spent some time cleaning up the linked list uses in trinity. Not happy with the resulting patch so didn’t commit it. Maybe next week.
- Short bi-weekly kernel meeting in #fedora-meeting. Nothing exciting.
Looked over my 3.10-rc5 outstanding issues. Some more patches got merged, so things are starting to look better, as long as I don’t find any new problems next week.
Daily log June 14th 2013 is a post from: codemonkey.org.uk
June 14, 2013 08:10 PM
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
244 |
424 |
134 |
72 |
(874) |
| Opened since 2013-06-07 |
2 |
25 |
12 |
7 |
(46) |
| Closed since 2013-06-07 |
10 |
9 |
3 |
1 |
(23) |
| Changed since 2013-06-07 |
16 |
39 |
32 |
13 |
(100) |
Weekly Fedora kernel bug statistics – June 14th 2013 is a post from: codemonkey.org.uk
June 14, 2013 04:06 PM
June 13, 2013
I have the pleasure of moderating the Fedora Project Board Town Hall today, 1900 UTC, having served on the board for five years previously. Held on IRC, these Town Halls give project members a chance to ask questions directly of the five Board candidates, so that you can make a more informed decision when casting your vote. I hope you can join us.
June 13, 2013 03:38 PM
- Did some more work on my test harness for setting up an assortment of disk configurations.
- While testing it, hit a bug during RAID5 reconstruction. This was a pain to get a trace out of, because it hits a BUG_ON. When that happens usb-serial stops spitting out characters, which seems suboptimal, but might be tricky to fix. I ended up just converting it to a WARN_ON to get the trace. A patch fixing this has existed for almost a month. sigh. At least it got merged by the end of the day.
Daily log June 12th 2013 is a post from: codemonkey.org.uk
June 13, 2013 04:58 AM
June 12, 2013
- Hit an AES bug that prevented my laptop booting 3.10-rc. Reported to the author of the only relevant change, and got a patch back quickly which solved it. If only all bugs had that kind of turnaround time.
- Poked at my fedora kernel bug triage script. Realised that right now, it’s incomplete because python-bugzilla doesn’t support setting keywords, only querying. Someone with python-fu want to implement that (and component reassign) ?
- Finally got fed up with all the reports from the soft lockup detector from users of vmware/kvm, and wrote a patch to automatically disable it there. It appears that if you create enough load in the host, guests don’t get scheduled within the time range necessary for the softlockup detector to keep ticking.
Daily log June 11th 2013 is a post from: codemonkey.org.uk
June 12, 2013 04:03 AM
June 11, 2013
I posted a new screencast that talks about ten new ease-of-use features that are new in Fedora 18.
10 New Features in LIO and targetcli
- Easier storage->ACL setup
- Name shows up as LUN model name
- Tags for initiator aliases and grouping
- ‘info’ command
- IPv6 portal support
- WWNs normalized
- Only show HW fabrics that are present
- 10 previous configs saved
- More info in summary
- iSER support
- Better sorting
June 11, 2013 04:38 PM
- Usual Monday morning email catch-up.
- Rebased my testing trees to rc5, restarted tests.
- Watched the Apple keynote. Wondering how long before we get a “new macbook air doesn’t work on Linux” bug report.
- More RCU bugs, which was actually a repeat of an -rc2 era bug. Hitting old bugs again is actually giving me some hope that there’s not anything too scary remaining in this code. Spent a while detangling all the different RCU/NOHZ bugs reported over the last few weeks.
- While chasing that stuff down, hit a new ftrace bug.
My 3.10-rc5 outstanding issues.
Daily log June 10th 2013 is a post from: codemonkey.org.uk
June 11, 2013 04:12 AM
Linus posted a git one-liner this morning that ended up intriguing me.
git log --pretty=%aD --author=davej@redhat.com | cut -c1-3 | sort | uniq -c | sort -n
The output looks like this..
37 Sat
42 Sun
50 Fri
73 Wed
78 Mon
79 Thu
112 Tue
What I found interesting, is that on every single git repo I ran that command on, Tuesday was my most productive day.
As much as I hate Mondays, I think the real reason there is that I treat Mondays as ‘catch up’ day for the most part. As can be clearly seen, I don’t really work on weekends any more, so Mondays tend to be dealing with a backlog of email/git commits, bringing kernels on test machines up to date, and paging in context from whatever I was working on the prior week.
git commit statistics. is a post from: codemonkey.org.uk
June 11, 2013 01:17 AM
June 10, 2013
If you are not a criminal, please publish your PIN, bank account password, email/facebook passwords and complete medical history. You don't have anything to hide, do you?
June 10, 2013 09:00 AM
June 08, 2013
At the end of the 2013 legislative session in Minnesota, legislators passed an omnibus energy bill which included, among other things, a requirement that investor-owned utilities in Minnesota (Read: Xcel Energy) must generate 1.5% of their electricity from solar by 2020. There were a lot of other things in there as a result of the sausage law-making process for the solar mandate, including some that I’m not very fond of, but the bottom line of encouraging more solar development is a good thing in my book. (Also, it was signed into law on my birthday!)
1.5% doesn’t sound like a whole lot, but what does it really mean in terms of physical solar PV deployments? Numbers have been tossed around that this will require 450MW of new capacity in the next 7 years.
Assuming the 450MW number is correct, and picking 250W panels as a common panel size today, that’s 450,000,000 / 250 = 1,800,000 or 1.8 million panels installed by 2020. That’s about 700 panels installed every day for 7 years.
If commodity sized (65x39cm) panels are used, that’s about 112 acres of panels (if they were laid out flat and edge to edge, which of course they aren’t) ;) That’s roughly equivalent to 112 US football fields.
Is this possible? Sure. Austria installed 230MW in 2012 alone. New Jersey installed 415MW in 2012. And Minnesota gave itself 7 years to accomplish this goal.
Is 450MW the right number? According to the NREL PVWatts calculator for Minneapolis, 450MW of optimally situated, fixed solar PV could be expected to generate 578,512 MWh of solar energy in the course of a year.
According to the EIA energy data browser, all utilities (including co-ops etc) in Minnesota generated 42,586,000 MWh in 2012. 578,512MWh is about 1.3% of that number. Xcel is by far the largest generator, so if we take out the smaller co-ops etc, 450MW does seem like a reasonable ballpark number.
There are already large companies ready to jump at this. Geronimo Energy has submitted a proposal to provide up to 100MW of capacity at up to 31 sites ranging from 2 to 10MW. I honestly hope this isn’t the predominant mode of development. We have an awful lot of flat roofs which would be well suited – for example, Ikea put 1MW on their Minnesota store last year. 100 acres or so isn’t all that much land, but I’d still rather see this go up on the built environment before we start using farmland & green space.
I’m excited to see how this works going forward. Will my friends in the small-scale solar installation business stay busy? Will SolarCity come to town? Will companies like Geronimo make up the bulk of this with giant installations? Will it reduce the need for new gas peaker plants? Time will tell, but it’s an exciting time for solar in Minnesota, for sure.
June 08, 2013 10:23 PM
- Woke up to a patch fixing the NETLINK_MMAP corruption bug I’ve been chasing. Seems to work.
- Some more polishing on trinity towards a new release (next week possibly).
My 3.10-rc4 outstanding issues.
- top reports cpus maxed while idle (Fix is in linux-tip:timers/urgent).
- XFS log recovery bug
- Trinity triggered bugs:
- NETLINK mmap corruption bug.
- udpv6 oops (Been around for a long time, still no fix).
- Assorted RCU/NOHZ_FULL bugs.
- Assorted perf bugs.
- T430s Lid events no longer put machine to sleep (actually an old bug since 3.9-rc1, haven’t had time to bisect)
Daily log June 7th 2013 is a post from: codemonkey.org.uk
June 08, 2013 04:07 AM
June 07, 2013
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
250 |
417 |
126 |
66 |
(859) |
| Opened since 2013-05-31 |
0 |
26 |
17 |
1 |
(44) |
| Closed since 2013-05-31 |
16 |
11 |
5 |
2 |
(34) |
| Changed since 2013-05-31 |
41 |
41 |
26 |
11 |
(119) |
Weekly Fedora kernel bug statistics – June 07 2013 is a post from: codemonkey.org.uk
June 07, 2013 05:54 PM
- Despite yesterdays SSD death, turned out the XFS bug I was seeing is actually an XFS bug.
- Noticed the busy-while-idle bug again. Gathered info and pinged upstream. Annoying. Tested a patch currently in linux-tip.
- Reinstalled test machine with spare SSD.
- Once that was done, I chased down the root cause of the mystery mmap corruption. Investigation ongoing with upstream author, though it looks like this is 3.10-rc only. More details on that soon. Chasing this down has taken all week, and highlighted a number of areas I need to improve trinity in.
Daily log June 6th 2013 is a post from: codemonkey.org.uk
June 07, 2013 05:33 AM
June 06, 2013
- Another SSD died (Intel 520 this time). Annoyingly the one that I was chasing an XFS bug in. Lots of time spent transplanting it into different machines to try and salvage uncommitted changes without luck. RMA time again.
- Bad allergies combined with a headcold made the rest of the day a sudafed & red bull induced haze.
daily log June 5th 2013 is a post from: codemonkey.org.uk
June 06, 2013 02:36 PM
June 05, 2013
- Managed to hit the XFS ‘dir block has an entry not in the hash index’ bug again. Just like last time, it prevented booting, and running xfs_dump (or xfs_repair) on the block device from the rescue shell locked up. Ended up having to remove the SSD and put it in another machine over USB.
Created a xfs dump for Dave Chinner to ponder over.
- Built 3.10-rc4 for rawhide
- Continued attempts at narrowing down weird mmap bug.
- Prepping some more debug patches for rawhide over the next few days.
daily log June 4th 2013 is a post from: codemonkey.org.uk
June 05, 2013 03:40 AM
It is my pleasure to attend the HITCON
2013 and COSCUP 2013
conferences in July/August this year. They are both in Taipei. HITCON
is a hacker/security event, while COSCUP is a pure Free/Open Source
Software conference.
At both events I will be speaking at the growing list of GSM related
tools that are available these days, like OpenBSC, OsmcoomBB, SIMtrace,
OsmoSGSN, OsmoBTS, OsmoSDR, etc. As they are both FOSS projects and
useful in a security context, this fits well within the scope of both
events.
Given that I'm going to be back to Taiwan, I'm looking very much forward
to meeting old friends and former colleagues from my Openmoko days in
Taipei. God, do I miss those days. While terribly stressful, they
still are the most exciting days of my career so far.
And yes, I'm also going to use the opportunity for a continuation of my
motorbike riding in this beautiful country.
June 05, 2013 02:00 AM
June 04, 2013
- Spent pretty much the whole day chasing one bug. Which meant a bunch of trinity experiments to try and narrow down the cause. Still no real results, but feels like I’m getting closer. Need to start thinking about more generic ways to do this frequent “narrow down test cases” problem.
- Merged a couple pending trinity patches.
daily log June 3rd 2013 is a post from: codemonkey.org.uk
June 04, 2013 03:58 AM
June 03, 2013
I read Máirín Duffy’s coverage of the Fedora Board’s userbase discussion. Really interesting. I wanted to add my take.
tl;dr: Puppet/Chef make Fedora’s short support period much less of an issue.
The OS is a building block
I’ve been watching a lot of videos on DevOps lately. Several close friends of mine are sysadmins and I’ve been learning a lot from them about the transformation that their profession is undergoing. From this year’s ChefConf, Adam Jacob’s keynote and the talk by Sascha Bates really impressed on me the big change in how admins should view machines — They’re not permanent, or even semi-permanent. They are ephemeral snowflakes that may live a year, or just an hour, so don’t get too attached.
Part of why admins like VMs is because the isolation they provide between different services. I used to run mail, DNS, and httpd from a single machine. Everything was mostly separate but not quite everything. They had separate userids but everyone’s config was in /etc, even touching the same files, sometimes. A full disk affected everybody. /var/log/messages didn’t split up their logging cleanly (by default, anyway.) It all just was built assuming there would be an admin at a command shell who could use their brain to resolve the conflicts to make everything play nice on a single OS image.
One service per instance
Admins adopted VMs for isolation, increased density, and better per-service resource allocation, but then ran into other problems. The setup that they did by hand once per new-hardware now was once per instance. (Editing /etc/sudoers for the fiftieth time gets old.) The tools then evolved further, until today one may keep no persistent state in an instance. Now, an instance is kickstarted into existence and configured automatically for the one job it will ever do. The sysadmin’s job isn’t to herd boxen any more, it’s to build ‘em, run ‘em, and then reap ‘em.
All the OS mechanisms for co-existing server processes, they’re now either obsolete or vestigial to some degree. What is important is the malleability of the OS to assume all the tasks it may be asked to – rather like a stem cell needs to be able to become a nerve or muscle, but never needs to be both.
Never upgrade, just redeploy
Let’s come back to the odd fact that Fedora is both a precursor to RHEL, and yet almost never used in production as a server OS. I think this is going to change. In a world where instances are deployed constantly, instances are born and die but the herd lives on. Once everyone has their infrastructure encoded into a configuration management system, Fedora’s short release cycle becomes much less of a burden. If I have service foo deployed on a Fedora X instance, I will never be upgrading that instance. Instead I’ll be provisioning a new Fedora X+1 instance to run the foo service, start it, and throw the old instance in the proverbial bitbucket once the new one works.
Cheap and easy virt and config management gives admins what they’ve always wanted — stability when they want it (run a LT support distro image, or for the VM host) or the latest stuff for their fast-moving business-oriented instances, by running a fast-update or rolling-release distro.
What Fedora should do
We’re already working on some of these to some degree — I think we should try to do even more to ensure Fedora is useful for the fast-update instance role.
First, Fedora needs to be able to be small. Nobody’s going to read the manpages on a throwaway instance, nobody’s even going to run vi. Image size matters when multiplied for each instance. Can we get by without /usr/share/doc/* and its thousands of copies of the GPL text? Fedora seems pretty good but there must be more we can do.
Second, we need to ensure Fedora supports the packages people are really using these days. Latest Ruby. Latest OpenStack. Vagrant. Django. Chef. Puppet. All the weird JS stuff that’s popular now on GitHub.
Continue to improve packaging tools so it’s easier for new contributors to do their first package, as well as for long-time packagers to maintain more packages. And not just package for contribution to Fedora, but for admins to package for solely internal distribution. Like Sascha Bates stresses in her talk, packaging is a huge benefit to automation, but it does require effort. It can be easier.
Finally, I think we need to continue to look at how easy it is to configure and manage an instance of the OS, and tailor it more for automated configuration. I believe the key to this is adding programmatic interfaces where they are lacking. See my “All Plumbing needs an API” talk. Since we’re probably being configured by another piece of code rather than a person at the shell, we need clear, unambiguous programmatic interfaces with good error handling. Chef should not be calling cmdline tools and checking error codes, there should be a Ruby configuration library that natively controls the whatever-it-is directly! We want configuring Fedora to be fast, straightforward, and reliable.
Conclusion: Stable+fast-update is better than stable+self-built
Practically the whole history of Linux distros has been the conflict between stability and new features. With virtualization, one still must make this choice, but at a much finer granularity than before. If you’re going to re-instance within 6 months anyways, why manually build your latest-Ruby and whatnot to support your app on top of a stable distro image? Maybe just use Fedora for those.
June 03, 2013 04:00 PM
Since I wrote this, we've made some worthwhile progress on avoiding damaging Samsung hardware. The first is that the samsung-laptop driver appeared to be causing the firmware to attempt to write to an area of memory that was marked in the chipset, triggering a Machine Check Exception. That was what generated the pstore output that caused the problem originally. The driver now refuses to load if EFI is enabled, which avoids the problem. It's not ideal, since it's currently the only mechanism we have for certain functionality on Samsung laptops, but there you go.
The second problem was that avoiding crashing on boot didn't actually fix the problem in any fundamental way. Even with pstore disabled, it was possible for userspace to fill the nvram and trigger the same problem. Our first approach to this was to prevent any writes to nvram if the UEFI QueryVariableInfo() call reported that more than 50% of the nvram storage space would be used. That was safe, but led to another issue. The nvram storage area is typically implemented as part of the same flash chip as the firmware. Flash isn't arbitrarily accessible - changing the contents of a block typically involves rewriting the entire block. It's impractical to rewrite the entire nvram area on every write, so what actually happens is that deleting variables just results in them being marked as inactive but doesn't actually free up the space. The firmware can later perform some sort of garbage collection to free it up.
This caused us problems, since inactive space that hasn't been garbage collected yet isn't actually available, and as a result firmware implementations tend to count it as used. Say you had 64KB of nvram and wrote 32KB of variables. We'd then refuse to write any more because you'd drop below 50%. So you delete 16KB of the variables you've created and try again. Unfortunately, the firmware still thinks that there's 32KB in use and Linux would still refuse.
If you were lucky, rebooting would trigger a garbage collection run. If you weren't, it wouldn't. Problematic. Our next approach was to try to account for the space actually actively used by the variables, rather than relying on what the firmware told us via QueryVariableInfo(). This seems simple enough - just add up the size of all the variables and subtract that from the overall size to determine how much of the "used" space is actually just old inactive variables that can be ignored. However, there's still some problems there. The first is that each variable has some additional overhead associated with it, and the size of that overhead varies depending on the system vendor. We had to make a conservative guess, which could cause problems if systems had large numbers of small variables. The second is that the only variables the kernel can see are those that are flagged as runtime-visible. There may also be a significant quantity of nvram used to store variables that are only visible in boot services code. We could work around this by adding up sizes while we're still in boot services code, but on some systems calling QueryVariableInfo() before ExitBootServices() results in later calls to GetNextVariable() jumping to invalid addresses and crashing the kernel. Not a great approach.
Meanwhile, Samsung got back to us and let us know that their systems didn't require more than 5KB of nvram space to be available, which meant we could get rid of the 50% value and replace it with 5KB. The hope was that any system that booted with only 5KB of space available in nvram would trigger a garbage collection run. Unfortunately, it turned out that that wasn't true - some systems will only trigger garbage collection if the OS actually makes an attempt to write a variable that won't otherwise fit.
Hence this patch. The new approach is to ask the firmware how much space is available. If the size of the new variable would reduce this to less than 5K, we attempt to create a variable bigger than the remaining space. This should cause the firmware to realise that it's out of room and either (depending on implementation) perform a garbage collection run at runtime or set a flag that will cause the system to perform garbage collection on the next reboot. We then call QueryVariableInfo() again to see whether a garbage collection run actually happened, and if so check whether we now have enough space. If so, we go ahead and write the variable. If not, we tell userspace that there's not enough space.
This seems to work in all the situations I've tested, and it should avoid ending up in a situation where a Samsung can end up bricked. However, it's firmware, so who knows whether it's going to break things for someone else.
comments
June 03, 2013 03:25 PM
Today, very sad news has reached me: Atul Chitnis has
passed away. Most people outside of India will most likely not
recognize the name: He has been instrumental in pioneering the BBS
community in India, and the founder and leader of the Linux Bangalore
and later FOSS.in
conferences, held annually in Bangalore.
I myself first met Atul about ten years ago, and had the honor of being
invited to speak at many of the conferences he was involved in. Besides
that professional connection, we became friends. The warmth and
affection with which I was accepted by him and his family during my many
trips to Bangalore is without comparison. I was treated and accepted
like a family member, despite just being this random free software
hacker from Germany who is always way too busy to return the amount of
kindness.
Despite the 17 year age difference, there was a connection between the
two of us. Not just the mutual respect for each others' work, but
something else. It might have been partially due to his German roots.
It might have been the similarities in our journey through technology.
We both started out in the BBS community with analog modems, we both
started to write DOS software in the past, before turning to Linux. We
both became heavily involved in mobile technology around the same time:
He during his work at Geodesic, I working for Openmoko. Only in recent
years his indulgence in Apple products was slightly irritating ;)
Only five weeks ago I had visited Atul. Given the state of his health,
it was clear that this might very well be the last time that we meet
each other. I'm sad that this now actually turned out to become the
thruth. It would have been great to meet again at the end of the year
(the typical FOSS.in schedule).
My heartfelt condolences to his family. Particularly to his wonderful
wife Shubha, his daughther Anjali, his mother and brother. [who I'm
only not calling by their name in this post as they deserve some privacy
and their Identities is not listed on Atuls wikipedia page].
Atul was 51 years old. Way too young to die. Yet, he has managed to
created a legacy that will extend long beyond his life. He profoundly
influenced generations of technology enthusiasts in India and beyond.
June 03, 2013 02:00 AM
June 01, 2013
This is scary. I wonder if it is worth filling security bugs against web browsers?
June 01, 2013 01:12 PM
May 31, 2013
- Bisection revealed the source of the mystery mmap bug. Still unclear as to why it happens. Mailed the upstream author.
- Replacement SSD turned up yesterday, so spent a while installing it, and reinstalling laptop. Noticed that they still ship new SSDs with out-dated firmware.
Spent a while creating a bootable USB key to upgrade it.
- bi-weekly #fedora-meeting.
- Finally got to the point where I had to install the aircon unit again.
Amusing URL of the day.
My 3.10-rc3 outstanding issues.
- tickbroadcast bootmem allocator trace.
- XFS xfs_setattr_size assertion pending fix
- XFS unhashed dirblock bug.
- Mystery unprivileged ‘crash other processes’ bug.
- Trinity triggered bugs:
- udpv6 oops (Been around for a long time, still no fix).
- Assorted RCU/NOHZ_FULL bugs.
- T430s Lid events no longer put machine to sleep (actually an old bug since 3.9-rc1, but laptop is out of action until next week)
daily log May 31st 2013 is a post from: codemonkey.org.uk
May 31, 2013 10:11 PM
Quite a few CVE’s this month. Made some progress through the btrfs backlog.
The 3.9 rebase revealed some new “Can’t boot” cases, which looks like yet another UEFI bios bug.
Monthly Fedora kernel bug statistics – May 2013 is a post from: codemonkey.org.uk
May 31, 2013 05:39 PM
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
267 |
404 |
110 |
65 |
(846) |
| Opened since 2013-05-24 |
7 |
36 |
14 |
1 |
(58) |
| Closed since 2013-05-24 |
14 |
11 |
8 |
4 |
(37) |
| Changed since 2013-05-24 |
16 |
63 |
24 |
4 |
(107) |
Weekly Fedora kernel bug statistics – May 31st 2013 is a post from: codemonkey.org.uk
May 31, 2013 05:32 PM
- end of month administrivia. status reports etc.
- booked flight for FLOCK.
- Last nights storm knocked out power a few times. One machine didn’t come back up until I left it off overnight. More strange, is that one LCD wouldn’t power up until I replaced its IEC cable. Took advantage of the situation to recable a bunch of stuff in my office.
- Reorganisation included removal of several switches that were linked to other switches. Even though these were gigabit switches, latency between machines ‘feels’ subjectively better. I never took ‘before’ measurements to prove it one way or another.
- For no obvious reason, since moving from one side of the room to the other, firewall thinks ctrl is stuck on keyboard.
- Continued bisecting mystery mmap bug.
daily log May 30th 2013 is a post from: codemonkey.org.uk
May 31, 2013 04:43 AM
May 30, 2013
Up early. Went into office. On arrival, found that laptop wouldn’t boot (now I remember why this was the ‘backup’ laptop).
Spent an hour trying to coax it to until giving up and attempting reinstall. Which failed.
Did some bugzilla triage on an ipad that I happened to have with me. (Surprisingly not a horrible way to do this, first time for everything). Noticed we seem to be getting more filesystem/vfs related bugs than we used to. Going to need to scrounge up some more disks/controllers soon I think.
Started feeling ‘not right’ around lunchtime. Skipped lunch. Headaches and nausea mid afternoon.
Not the most productive of days.
daily log May 29th 2013 is a post from: codemonkey.org.uk
May 30, 2013 12:59 AM
May 29, 2013
- email backlog of doom.
- Another week, another xfs bug.
- Found (yet another) perf/rcu bug. It never ends.
- Investigating a bug which appears to allow a user to kill other users (even root) processes. (All day, ongoing).
- Made a change to trinity to allow calls to syscalls with no arguments. Some other small fixes.
- Infrastructure work for some IO testing we’ll be doing soon.
- Reinstalling backup laptop (Dell Adamo) in absence of new SSD for thinkpad.
daily log May 28th 2013 is a post from: codemonkey.org.uk
May 29, 2013 03:28 AM
May 28, 2013
There's now no shortage of Linux distributions that support Secure Boot out of the box, so that's a mostly solved problem. But even if your distribution supports it entirely you still need to boot your install media in the first place.
Hardware initialisation is a slightly odd thing. There's no specification that describes the state ancillary hardware has to be in after firmware→OS handover, so the OS effectively has to reinitialise it again. This means that certain bits of hardware end up being initialised twice, and that's slow in some cases. The most obvious is probably USB, which has various timeouts as you wait for hardware to settle. Full USB support in the firmware probably adds a couple of seconds to boot time, and it's arguably wasted because the OS then has to do the same thing (but, thankfully, can at least do other things at the same time). So, looking for USB boot media takes time, and since the overwhelmingly common case is that users don't want to boot off USB, it's time that's almost always wasted.
One of the requirements for Windows 8 certified hardware is that it must complete firmware initialisation within a specific amount of time, something that Microsoft refer to as "Fast Boot". Meeting these requirements effectively makes it impossible to initialise USB, and it's likely that certain other things will also be skipped. If you've got a USB keyboard then this obviously means that your keyboard won't work until the OS starts, but even i8042 setup takes time and so some laptops with traditional PS/2-style keyboards may not set it up. That means the system will ignore the keyboard no matter how much you hammer it at boot, and the firmware will boot whichever OS it finds.
For a newly purchased device, that's going to be Windows 8. It's not too much of a problem with a fully installed Windows 8, since you can hold down shift while clicking the reboot icon and get a menu that lets you reboot into the firmware menu. Windows sets a flag in a UEFI variable and reboots the system, the firmware sees that flag and does full hardware initialisation and then drops you into the setup environment. It takes slightly longer to get into the firmware, but that's countered by the time you save every time you don't want to get into the firmware on boot.
So what's the problem? Well, the Windows 8 setup environment doesn't offer that reboot icon. Turn on a brand new Windows 8 system and you have two choices - agree to the Windows 8 license, or power the machine off. The only way to get into the firmware menu is to either agree to the Windows 8 license or to disassemble the machine enough that you can unplug the hard drive[1] and force the system to fall back to offering the boot menu.
I understand the commercial considerations that result in it ranging from being difficult to impossible to buy new hardware without Windows pre-installed, but up until now it was still straightforward to install an alternative OS without agreeing to the Windows license. Now, installing alternative operating systems on many new systems will require you to give up certain rights even if you want nothing other than to reach the system firmware menu.
I'm firmly of the opinion that there are benefits to Secure Boot. I'm also in favour of setups like Fast Boot. But I don't believe that anyone should be forced to agree to a EULA purely in order to be able to boot their own choice of OS on a system that they've already purchased.
[1] Which is a significant and probably warranty-voiding exercise on many systems, and that's assuming that it's not an SSD soldered to the motherboard…
comments
May 28, 2013 09:41 PM
May 26, 2013
Now that the entire series is done I've figured a small overview would be in order.
Part 1 talks about the different address spaces that a i915 GEM buffer object can reside in and where and how the respective page tables are set up. Then it also covers different buffer layouts as far as they're a concern for the kernel, namely how tiling, swizzling and fencing works.
Part 2 covers all the different bits and pieces required to submit work to the gpu and keep track of the gpu's progress: Command submission, relocation handling, command retiring and synchronization are the topics.
Part 3 looks at some of the details of the memory management implement in the i915.ko driver. Specifically we look at how we handle running out of GTT space and what happens when we're generally short on memory.
Finally part 4 discusses coherency and caches and how to most efficiently transfer between the gpu coherency domains and the cpu coherncy domain under different circumstances.
Happy reading!
Update: There's now also a new article with a few questions and answers about some details in the i915 gem code.
May 26, 2013 02:42 PM
So apparently people do indeed read my my i915/GEM crashcourse and a bunch of follow-up questions popped up in private mails. Since I'm a lazy bastard I've clean some of the common questions&answers up to be able to easily point at them. And hopefully they also help someone else to clarify things a bit.
Question: What’s the significance of i915_gem_sw_finish_ioctl ? It seems to flush cpu caches, but only conditional on obj->pin_count != 0. Why does it no unconditionally flush the cpu caches like e.g. when we move an unsnooped/not LLC-cached object into a gpu domain?
Answer: i915_gem_sw_finish_ioctlis only used to flush out cpu rendering to the display (and in current userspace it's not used at all). obj->pin_count != 0 is used as a proxy for "this a scanout buffer". Obviously more intelligent userspace should know whether it is doing cpu rendering to a displayed buffer or not and force the expensive clflushing with e.g. the set_domain ioctl only when really required. But the sw_finish ioctl is called from the libdrm cpu mmap unmap function, which does not have this knowledge at hand, hence the check in the kernel. Furthermore for efficient integration of cpu rendering into the gpu render pipeline we want to use snoopable objects even on non-LLC platforms which means that this ioctl shouldn't really be used any more for new code.
Question: So the cpu can only access a GEM object through the GTT when it's in the mappable part of the GTT, i.e. when gtt_offset + size <= gtt_mappable_end. But the i915_gem_object_set_to_gtt_domain function does not check that whether this condition is satisfied or not and simply goes ahead with the domain change. Why is that done so, even though the cpu won't be able to access the buffer object at its current place?
Answer:The GTT domain is purely about coherency, i.e. a buffer object is in the GTT domain if reads/writes through the GTT would see the correct values. The other big domain is cpu domain, i.e. the data (when accessed directly in the physical memory location, not going through the GTT) is coherent with cpu caches. Shifting between these two domains requires flushing/invalidating cpu caches.
Note that on recent kernels that doesn't even mean that there's a global GTT mapping allocated for that buffer object: This is used to optimize away the redundant cache flushing when moving an object around, e.g. when moving it into the mappable range to serve a cpu access page fault. In the future this will be even more common once we have proper per-process GTT address spaces. Then an object could be fully coherent with the GTT domain, read by the gpu through a PPGTT mapping, but don't have an offset allocated for it in the global GTT at all.
The mappable GTT address range on the other hand is a different concept and simply means the object has a GTT mapping visible to the cpu (on gpus without PPGTT the global GTT can be up to 2g, but only 256m are usually visible in the pci bar). Note that GEM object can be mappable but can be (at the same time) in the cpu domain. This happens when userspace writes to the buffer object through the cpu mappings.
Question: How does the the i915_gem_fault function handle a page fault when it itself is invoked through a page fault in the i915 GEM kernel code? Like suppose if fault_in_pages_readable function is called which dereferences a user pointer - won't that cause issues with deadlocks?
Answer:Yes, this can happen and we need to be careful that we cannot possible deadlock with our own pagefault handlers. And it's not just theoretical, it happens in the wild when a GL client tries to use a pointer obtained from one of the texture mapping funtions (which can use a GTT memory mapping internally) to upload data (which could use the pwrite GEM ioctl).
These potential deadlocks are resolved by instructing the linux memory subsystem to not serve pagefaults when accessing userspace memory but instead fail it. Then our code can release any resources and locks required by our own page fault handler and retry the operation in a slowpath. Often this requires that we copy the data into a (unfaultable) temporary buffer in kernel's memory space. These atomic sections are often implicit, but we have a few places where we need to explicitly disable page fault handler with pagefault_disable/enable() calls.
Question: Is obj->fenced_gpu_access ever set on modern platforms - it seems not? Or could this cause a stall waiting for the gpu when all fences are in use and we need a few fence to handle a GTT page fault?
Answer: No, this is only ever set on Gen2/3 devices. Those gpus use the same GTT fences used on all platforms for detiling cpu access also for gpu access, at least for some gpu rendering functions. So this is irrelevant on modern platforms and can't lead to a stall in the pagefault handler when accessing an otherwise idle buffer object.
Question: What is this wedeged stuff - there's lots of references to it in the i915 GEM code?
Answer: This is part of the gpu hang detection and reset handling code, which I didn't really cover in my crashcourse. It is set when we've detected a hang but failed to reset the gpu. It will cause all subsequent command submission from userspace to fail with -EIO, which is used by userspace as a signal to fall back to software rendering. The i915 hang detection and reset code has been (and still is) under pretty active development and is nowadays a rather complex piece of code. I plan to cover it more in-depth hopefully soon.
Question: In the use_cpu_reloc function, why is the obj->cache_level != I915_CACHE_NONE condition used?
Answer: That's just crazy optimization - it's always faster to write relocations through cpu maps if LLC caching is enabled. But without caching it's faster to use global GTT access - but then only if we have the mappable mapping already set up. Note that pwrite ioctl code has similar tricks.
May 26, 2013 02:40 PM
May 25, 2013
I wanted to look at how much the “clipping” behavior of power-limited solar microinverters affected my annual energy production. The TL;DR version is: at worst, only about 0.6% loss due to clipping. For more, read on.
A photovoltaic inverter is a device which converts the DC energy from the panel into AC energy for the grid; it also manages optimum power point tracking. Traditionally this was done with a big central inverter for all panels combined; recently companies such as Enphase Energy have started making microinverters, which are per-panel devices. One advantage of these devices is that each panel operates independently so that if one panel is shaded, damaged, or dirty, it doesn’t affect the rest of the array.
I have 11 230W Solar PV panels on my roof with an Enphase M190 micronverter on each. These are nominally 190W devices, though in practice they have a maximum output of 199W. (Note, these are 3 year old models; Enphase now has microinverters with more capacity). The fact that the panels can produce more than the microinverter can handle might seem like an issue; indeed on a cool, clear day we can see the effect:
So from the graph above it’s clear that I am losing a little energy production during that clipping. What would normally be a smooth curve is flattened out at the top as I hit the 11x199W = 2189W limit. (A few factors affect whether this clipping happens; obviously we need a clear day, but optimal sun angle and, perhaps more than anything, panel temperature affects it greatly).
I wanted to try to quantify this a bit – how much am I really losing from this behavior?
One cool thing about the Enphase units is that they report data every 5 minutes, and this data can be queried via a standard data API. So I pulled down the past 365 days worth of data to see how often I was clipping. I grabbed 5-minute data files for each day, and looked for when clipping seems to start, by looking at watt output around the clipping point, and how many 5-minute entries there were for each wattage:
$ egrep -w "21[789][0-9]" *.json | awk -F : '{print $8}' | sort | uniq -c
9 2170,"enwh"
10 2171,"enwh"
15 2172,"enwh"
11 2173,"enwh"
17 2174,"enwh"
17 2175,"enwh"
11 2176,"enwh"
20 2177,"enwh"
24 2178,"enwh"
14 2179,"enwh"
17 2180,"enwh"
21 2181,"enwh"
32 2182,"enwh"
51 2183,"enwh"
107 2184,"enwh" <-- actual clipping start?
166 2185,"enwh"
134 2186,"enwh"
119 2187,"enwh"
97 2188,"enwh"
62 2189,"enwh" <-- nominal clipping, 11x199
21 2190,"enwh"
2 2191,"enwh"
There’s a pretty big jump at 2184W, so I went with that as a definition of “when clipping starts” vs. the nominal 2189W. Adding up the occurrences of clipping, I got 708 5-minute intervals of clipping out of the last 365 days. That’s about 59 hours.
So how much energy is that? My panels can nominally make 11x230W = 2530W of output, so 2530-2184 = 346W lost, at most, during clipping. That’s actually too high; not every instance above is clipping, and not every interval would have been making the maximum output. So we’ll take that as a high estimate.
346W x 59 hours is 20,414 watt-hours, or about 20kWh. At around $0.10/kWh, that’s about $2.00 of lost value. Over the same year period, my array made 3,356 kWh, so 20kWh lost is about 0.6% of that. I would hope that the microinverters made up at least that much by virtue of keeping the array going over the winter when some panels were snow-covered, etc. Remember, many of my assumptions above make this a high estimate.
One other interesting datapoint is to see when this clipping occured. By month, here’s how it looks:
March was far and away the highest; this is probably due in large part to the cooler temperatures, which make the panels more efficient.
May 25, 2013 09:08 PM
May 24, 2013
- Woke up to find a fix in my inbox for the XFS setattr bug. Bug turned out to be a few years old, and also highlighted a broken test in xfstests. Two bugs for the price of one.
- Applied the daily “fix trinity compile on old distros” bug.
- Briefly looked at Boehm-Demers-Weiser garbage collector with a view to maybe using it in Trinity in the absence of better allocation tracking.
- Updated a triage script to add all the aliases we have for fedora kernel bugs. Hoping that in time that script can grow to be even more useful.
- Tried out another potential fix for the RCU problems I’ve been seeing. Seems good so far.
- Looked at a bunch of “can’t boot” bugs that came in since F18 got rebased to 3.9. Found a thread upstream that seems to be discussing the same bug.
- Spent the afternoon compiling OCLint out of curiousity. Had no idea how long it would take, or how much it would increase the temperature in my office. After four hours of sitting in a simulated sauna, llvm finished building, but oclint wouldn’t build. Ran out of time to play with it. Something for next week.
- Seeding improvements from Kees to trinity.git
and now: three day weekend. \o/
daily log May 24th 2013 is a post from: codemonkey.org.uk
May 24, 2013 11:20 PM
In addition to turning your Fedora 18 box into an iSCSI target, LIO also supports other SCSI transport layers (‘fabrics’), such as Fibre Channel, with the qla2xxx fabric.
The most crucial bit is to verify that the qla2xxx driver has initiator mode disabled — it should be operating in target mode only. You can check this with:
cat /sys/module/qla2xxx/parameters/qlini_mode
It should say ‘disabled’. If it doesn’t, create a file called /usr/lib/modprobe.d/qla2xxx.conf and put:
options qla2xxx qlini_mode=disabled
in it. Then, run ‘dracut -f’ to rebuild your initrd, and reboot.
Some of you may be wondering: why /usr/lib/modprobe.d instead of /etc/modprobe.d ? This is because qla2xxx is likely loaded from the kernel’s initial ramdisk (initrd), and dracut, the initrd building tool, omits “host-specific” settings in /etc/modprobe.d. While you’re mucking around, also make sure the firmware package for your qla device, such as ql2200-firmware or similar, is also installed.
targetcli won’t let you create a qla2xxx fabric if qlini_mode is wrong. Once it lets you create the qla fabric, you can add luns to it and grant access permissions to acls exactly in the same manner as the other LIO fabrics.
May 24, 2013 04:29 PM
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
276 |
380 |
104 |
66 |
(826) |
| Opened since 2013-05-17 |
4 |
32 |
6 |
3 |
(45) |
| Closed since 2013-05-17 |
3 |
11 |
4 |
3 |
(21) |
| Changed since 2013-05-17 |
16 |
48 |
14 |
5 |
(83) |
Weekly Fedora kernel bug statistics – May 24th 2013 is a post from: codemonkey.org.uk
May 24, 2013 04:10 PM
- Continued chasing the xfs bug. Confirmed that with the pending fix in the XFS tree, I can no longer reproduce the bug I saw. Then 3-4 hours later, it popped up again.
Annoying. Finally got some debug info to Dave Chinner at the end of the day.
Timezone differences making it even more annoying to debug.
- Fixed up a compile problem on older distros in trinity since the recent perf changes.
- Vince Weaver’s perf specific trinity fork is finding more bugs in the perf syscall already [1], [2].
- Merged trinity patch from Daniel Borkmann to add randomized seccomp filters generated by markov chain. Interesting stuff.
- Planning for some interesting testing work next week.
- Started some prep work for trinity 1.2
My 3.10-rc2 outstanding issues:
- RCU/NOHZ_FULL bugs [1], [2]
- RCU bootmem allocator trace.http://codemonkey.org.uk/?p=477&preview=true
- tickbroadcast bootmem allocator trace.
- XFS Slab corruption (pending fix)
- XFS xfs_setattr_size assertion
- T430s Lid events no longer put machine to sleep (actually an old bug since 3.9-rc1, but laptop is out of action until next week)
Puzzling website of the day: pain registers.
daily log May 23rd 2013. is a post from: codemonkey.org.uk
May 24, 2013 04:25 AM
May 23, 2013
Going to try and continue yesterdays daily log format for a while.
- Grumbled at openvpn changing pathname for ‘plugin’ to ‘plugins’ breaking my vpn script.
- Bugzilla seemed unhappy. Gave up trying to look at it after it kept timing out.
- Continued poking at the XFS assertion from yesterday. Downgraded the compiler from f19′s 4.8.0 to 4.7.3. No luck. Couldn’t reproduce on 3.9, so started bisecting. Seemed to be caused by a patch I added recently to work around another XFS bug (slab corruption). I can’t win. Dave Chinner confused by my diagnosis. Bisect take 2 on that tomorrow.
- Vince Weaver posted a perf_event fuzzer based on trinity. Spent a while reading it over. Neat. Glad to see people taking an idea and running with it in new directions. The more test programs the better.
- Diagnosed yesterdays “microcode loader got slow” bug. Turned out that I had somehow inadvertently set CONFIG_FW_LOADER_USER_HELPER, which incurs a 60 second timeout.
- While waiting for bisections, looked over some bugs in coverity’s database. Around 1500 untriaged. Would like to find time to work on that some at some point.
Spent so much of the day bisecting/building/rebooting that I didn’t write much new code today. Ho-hum.
daily log May 22nd 2013. is a post from: codemonkey.org.uk
May 23, 2013 03:51 AM
May 22, 2013
As detailed in this blog post, I've expanded the set of man pages rendered in HTML at http://man7.org/linux/man-pages/ to include pages in addition to those provided by the man-pages project. This change has several purposes. One main purpose is to provide a up-to-date and regularly updated HTML renderings of these man pages. (Most online man page renderings are out-of-date to some extent--in some cases, extremely out of date.) The other main purpose is to provide information on where to report bugs in each man page. To this end, each HTML rendering includes a COLOPHON that describes the origin of the page, notes the date when it was extracted, and provides information on where to report bugs in the page. (The man-pages project has already done this since December 2007, with the result that many more man page bugs are nowadays reported.)
Currently, man pages from nearly 40 projects are rendered, raising the number of pages rendered at man7.org from around 950 to around 1750. The projects that I have so far included have a bias that matches my interests: man-pages, projects related to low-level C and system programming (e.g., the ACL and extended attribute libraries), toolchain projects (e.g., gcc, gdb, Git, coreutils, binutils, util-linux), and other relevant tools (kmod, strace, ltrace, procps, expect) and tools relevant to manual pages (e.g., groff, man-db). The full list of projects and the corresponding man pages that are rendered can be found in the man pages by project index. I'm open to adding further projects to the rendered set, if they seem relevant. If you think there is a project that should be added, take a look at this blog post.
May 22, 2013 01:18 PM
Got back from vacation today (since last Thursday). Here’s how I spent the day.
- Caught up (skimmed) the 1500 postings to Linux-kernel and related mailing lists that had accumulated.
- Reviewed, applied and cleaned up my patch backlog for trinity.
- Caught up with direct mail that needed a response.
- Brought my test machines up to 3.10rc2, and restarted tests.
- Caught another pair of RCU/nohz bugs pretty quickly. [1][2].
- Checked on the RMA for my failed SSD. Still awaiting shipment of replacement.
- Received my ultrabay adaptor for my thinkpad. Surprised to find out that a full height SSD would fit into it.
- Pushed out a 3.9.3 update for F18
- Looked at bugzilla backlog. Swore a lot. 3.9.x rebase bugs started to trickle in.
- Rewrote a bunch of code surrounding trinity’s rand() usage.
- Finally got F19 installed via NFS on new test machine.
- Hit an XFS assertion.
- Then hit an i915 pineview kms console blanking bug.
- Noticed that x86 microcode loading had gotten really slow. It seems to be waiting a whole 60 seconds for each core.
a day in the life.. is a post from: codemonkey.org.uk
May 22, 2013 03:36 AM
May 17, 2013
A few years ago, I gave a history of the 2.6.32 stable kernel, and
mentioned the previous stable kernels as well. I'd like to apologize for not
acknowledging the work of Adrian Bunk in maintaining the 2.6.16 stable kernel
for 2 years after I gave up on it, allowing it to be used by many people for a
very long time.
I've updated the previous post with this information in it at the bottom, for
the archives. Again, many apologies, I never meant to ignore the work of this
developer.
May 17, 2013 04:34 PM
May 16, 2013
At Havana summit they were giving away a paper version of Joe Arnold's "Software Defined Storage with OpenStack Swift". Very useful book for anyone dealing with Swift, I would be glad to pay the cover price of $25. But even more interestingly than tips on care and feeding of Swift, Joe opens the whole book thus:
[...] a de-coupled management system so customers could achieve (1) amazing flexibility in terms of how (and where) they deployed their storage, (2) control of their data without being locked-in to a vendor and (3) private storage at public cloud prices.
These features are the essence of Software Defined Storage (SDS), a new term the meaning of which is being defined. [...] Key aspects of SDS are scalability, adaptability, and the ability to use most any hardware. Through this de-coupling, operators can now make choices on how their storage is scaled and managed and how users can store and access data — all driven programmatically for the entire storage tier, regardless of where the storage resources are deployed.
Parts of the above prompt questions. Firstly, what good is de-coupling in respect to lock-in? SwiftStack effectively locks in by owning the de-coupled management. Sure, you own your data and could, in theory, manage your Swift with another management plane... I do not expect anyone crazy enough to try switching by anything less than standing up a new cluster. In any case, that part is not important, IMHO. The important part is programmatic control.
The phrase "SDS" jumps off "Software-Defined Networking". When SDN came into OpenStack, I was quite skeptical about it. It seemed too much like vendor-driven marketing bullshit. However, as users deployed the Project Formerly Known as OpenStack Quantum, it became clear that SDN answers their needs. The chief need was the ability to shape networks programmatically, overlaid on top of the physical networking plant, in service of the VMs.
Before SDN, when all this cloud thing came about, practitioners also struggled with the definition of it, and in particular the difference from the plain old datacenter virtualization. The difference is the programmatic control throughout. RHEV (now oVirt) eventually grew an API, which blurred the lines. But in OpenStack it was the main feature from the start. So you can manage everything and anything programmatically, including, for example, running on bare hardware. One can say that cloud is "Software-Defined Computing".
So, how does this programmatic thing apply to Swift? Joe had interesting insights cunningly hidden in the book, like these:
In an SDS system, reliability is the responsibility of the software, not the hardware. Replication and data integrity tactics are used to ensure that data does not become corrupt and that lost data is recovered.
[...]
A crucial function of an SDS system is to orchestrate capacity — storage, networking, routing & services — for entire cluster.
Swift covers the first part well already. The second is missing, or "de-coupled".
For galactic fairness, he also wrote things that seem wrong-headed to me:
There is no application sharding or managing volumes which can drive operational knowledge and complexity into applications because the SDS system is one cohesive system. Users do not need to ask for or know 'which storage pool' should be used because there is only one namespace.
The problem with hiding the pools outside of namespace is that they become invisible to the programmatic control as well, and such control is essential to the very definition of SDS. Someone at Amazon made a brilliant decision to make buckets a unit of replication in S3, so they can be linked to a region. In effect this hides the complexity but exposes knowledge that an application needs. Thus, any S3 client can do what Joe coniders SDS, but without any de-coupling, through the namespace and inside the API (or it can chose not to do it and just use a default region, for simplicity).
Joe's employees are hard at work implementing the vision as he outlined it, using the concept of regions that are internal to Swift cluster. The problem for everyone else, however, is how the programmatic control of that stack is exclusive to SwiftStack (with some useful things leaking into Swift, such as changeable replica count).
So, in the end, today Swift offers a solid foundation and parts of an SDS system, but the orchestration is "de-coupled" away elsewhere. Seems like a clear challenge to OpenStack to (re-)create the missing pieces.
P.S. I'd love to see the missing parts inside the Swift API and even namespace, although we have a problem here. Our Accounts and Containers are not guaranteed to live anywhere specifically or even on the same nodes. Changing that would be a step that I prefer. But Joe prefers to give up on plugging programmatic orchestration into the Swift API and just "de-couple" the heck of it. John, our benevolent PTL, seems to toe that line. Maybe they are right.
P.P.S. The deal with the programmatic orchestration is something that "unified" storage projects have to address too. E.g. in GlusterFS a program can issue mkdir(2). Is this programmatic control? No, not enough. Okay, they have glusterfsd nowadays, I can create volfiles in there, is that SDS? That is getting closer!
May 16, 2013 04:19 PM
My bad luck with hardware continues.
At the beginning of this year, I bought an SSD for my laptop
I previously wrote about the need to update smartmontools, which should now be updated everywhere. One thing I was not aware of at the time however, was that there’s a firmware update available. Had I known this, I would have applied it, because as soon as I hit the “400GB of lifetime writes” counter (coincidence?), it lost the ability to write to any block. It won’t even respond to secure erase commands.
The failure is exacerbated by the fact that the disk contains journalling filesystems in need of recovery. So if anything tries to mount them, it tries to write to the disk, and then falls off the bus requiring a power cycle to even see the disk again. The recovery tools provided by OCZ apparently try to mount every partition it finds during boot up (derp).
So now it’s on its way back to OCZ for reflashing/replacement. Lesson learned.
If you have one of these, and hdparm -I shows you have firmware 1.03, you might want to update it to 2.0. There are flashing tools on ocz’s site.
(in the form of bootable linux images, using an insane desktop that looks like what hacker movies in the 1990s looked like). There’s no guarantee that the new firmware actually fixes whatever problem I’ve hit, due to the lack of changelogs, but given it was the first thing they asked me to try, I’m going to say there’s a strong possibility it’s a known bug.
PSA: OCZ Vector SSD firmware. is a post from: codemonkey.org.uk
May 16, 2013 04:16 PM
Last month Tommi found a kernel bug in perf_swevent_init using trinity, and posted a fix upstream. This apparently turned out to be a local root. Someone released an exploit for it this week. (interesting dissection of the exploit by spender here).
The code to fuzz perf_event_open was added to Trinity in November 2011. Yet for some reason, we only started to hit this recently. The sanitise routine for this syscall is still pretty basic, even after I added a little more to it yesterday. There’s probably more fruit on that branch somewhere.
There’s a date in the exploit code that claims it was written shortly after the affected code was merged upstream in 2010. Assuming that’s true, it’s taken way too long to find this. Trinity should have found this a lot sooner.
CVE-2013-2094. Another day, another fuzzed bug. is a post from: codemonkey.org.uk
May 16, 2013 03:01 PM
May 15, 2013
3.10rc1 came out a few days ago. At 12,000 changesets, lwn calls it the busiest such ever. Statements like that usually make me nervous. But things are generally in pretty good shape. Much better than 3.9rc1 was.
- There has been nowhere near the same level of fallout from trinity this cycle. The only bug I’m reliably hitting has been around for a while (connect vs sendmsg udpv6 oops)
- I hit a few crash-in-early-boot bugs that were a pain to debug. (fixes still pending merge)
- Some slab corruption found in XFS. (again, fixes pending merge). There’s some talk on lkml about an ext3 issue with the same symptoms, but I’ve not managed to reproduce this (yet?).
and that’s been about it.
Generally feeling pretty solid. Fedora 19 is still going to ship with 3.9, but we’ll likely have a 3.10.x update on day of release.
3.10rc1 testing status is a post from: codemonkey.org.uk
May 15, 2013 02:40 PM
One of the common arguments against solar as an energy source is that it’s just too variable. You can never count on it when you need it. What if clouds roll in and out? [1]
One counter-argument might be – well, you never know when anyone will turn on their AC, either, at least not minute-by-minute. The grid is a balancing act; unpredictable, random loads have the same effect as unpredictable, random generators.
To which one might then counter yes, but there are so many AC units out there, they average out, more or less, turning on and off at random times and smoothing things out in aggregate.
To which the solar advocate might reply OK, then with enough solar the peaks and valleys of generation should cancel out too, as clouds move out of one area into another. Does this seem likely out in practice?
To find out, I grabbed 5 minute data from about 40 Enphase systems in the twin cities on a highly variable, sporadically cloudy day. Because we don’t yet have a whole lot of solar here, and I didn’t want the one or two large commercial systems in the group to swamp the smaller residential systems, first I normalized them all to a % of their max output. (This might be cheating a little, but with a lot more systems randomly distributed in size and geography, the swamping-out effect should be minimiized.) Here’s what just 4 of those systems looks like; each is indeed pretty messy and unpredictable at the 5-minute range:
Then I averaged all of the systems. Here’s what the average looks like, compared to one of the individual systems:
It appears that things certainly do smooth out when we look at geographically distributed systems. If I were a grid operator, I might feel a lot better about that.
The caveats might be that this is a very wide geographic range – I grabbed systems from all of the twin cities and suburbs. And that’s probably larger than the various sub-grids within the cities; what the variability is within those subgrids is, or how this solar variability affects them, I’m not sure. And of course my initial normalization of all systems to the same size could be argued with.
There have been much more rigorous papers and presentations written on this as well, see for example “Quantifying PV Power Output Variability” by Thomas E. Hoff and Richard Perez in 1999, and “Implications of Wide-Area Geographic Diversity for Short- Term Variability of Solar Power” by Andrew Mills and Ryan Wiser at LBNL in 2010. But with the advent of 5-minute monitoring from systems like Enphase, I wonder if even better results could be found from this wealth of data.
[1] I’ll submit that a sporadically cloudy day is more trouble to a grid operator than a generally cloudy day. We often know if a day will be cloudy well ahead of time, and that doesn’t yield the minute-to-minute variations of a sporadically cloudy day. The grid is better, I think, at responding to these longer-term variations.
May 15, 2013 02:39 AM
May 14, 2013
...are here. Recording was going on, but I'm not sure if it is online somewhere...
May 14, 2013 12:54 PM
May 13, 2013
Whoops. Looks like I forgot to post my slides from last year’s LinuxCon Japan talk on the Linux kernel security subsystem.
Here they are:
http://namei.org/presentations/kernel-security-state-linuxconjp-2012b.pdf
I’ll be giving an update at the upcoming LinuxCon Japan in Tokyo in a couple of weeks.
May 13, 2013 11:14 AM
May 10, 2013
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
273 |
348 |
117 |
73 |
(811) |
| Opened since 2013-05-03 |
7 |
17 |
8 |
6 |
(38) |
| Closed since 2013-05-03 |
7 |
20 |
14 |
4 |
(45) |
| Changed since 2013-05-03 |
20 |
41 |
20 |
7 |
(88) |
Nothing terribly exciting in this weeks new bugs. Backlog continues to slowly get beaten down. Next week should see a rebase to 3.9 for F18.
Weekly Fedora kernel bug statistics – May 10th 2013 is a post from: codemonkey.org.uk
May 10, 2013 04:18 PM
May 09, 2013
This is not something to brag about, but apparently I managed to program computers for about 30 years without writing unit tests. Today it's recitified by adding a test to one of my projects voluntarily. I encountered the goodness of build-time testing when working on Jeff Garzik's Project Hail. And of course, OpenStack, including Swift, had them since forever. Those weren't my projects, however.
May 09, 2013 08:12 PM
I'm still running squeeze on my X60... and I decided that with wheezy becoming "stable", it was good idea to upgrade. Before I started, I did back up my root filesystem (fortunately), with
cp -a --one-file-system / somewhere
Upgrade was a bit of fight (like aptitude trying to take hours of cpu time), but eventually I succeeded... Only to realize that system no longer boots into GUI and (worse) that gnome2 is gone. I'm not great fan of gnome3; definitely on X60, anyway. Its animations feel excessive even when system is unloaded, if there's some background load it quickly becomes unusable. I googled a bit, and it did not look like going back to gnome2 is not exactly easy.
So I went back from the backup. First, chromium refused to run because new version broke the config files. I restored those from backup. But next... strangely my self-compiled 3.9 kernel stopped working. Stock debian kernel kept running, but own kernel ran init then rsyslogd broke the boot.
Can you guess what went wrong?
pc jvgu bar svyr flfgrz bcgvba vf ernyyl onq vqrn; vg jvyy abg pbcl rirelguvat sebz lbhe / svyrflfgrz, va cnegvphyne vg jvyy abg pbcl /qri, orpnhfr gurer'f gz csf zbhagrq bire vg. Bhpu.
May 09, 2013 12:20 PM
May 07, 2013
I've been working on TPMs lately. It turns out that they're moderately awful, but what's significantly more awful is basically all the existing documentation. So here's some of what I've learned, presented in the hope that it saves someone else some amount of misery.
What is a TPM?
TPMs are devices that adhere to the Trusted Computing Group's Trusted Platform Module specification. They're typically microcontrollers[1] with a small amount of flash, and attached via either i2c (on embedded devices) or LPC[2] (on PCs). While designed for performing cryptographic tasks, TPMs are
not cryptographic accelerators - in almost all situations, carrying out any TPM operations on the CPU instead would be massively faster[3]. So why use a TPM at all?
Keeping secrets with a TPM
TPMs can encrypt and decrypt things. They're not terribly fast at doing so, but they have one significant benefit over doing it on the CPU - they can do it with keys that are tied to the TPM. All TPMs have something called a Storage Root Key (or SRK) that's generated when the TPM is initially configured. You can ask the TPM to generate a new keypair, and it'll do so, encrypt them with the SRK (or another key descended from the SRK) and hand it back to you. Other than the SRK (and another key called the Endorsement Key, which we'll get back to later), these keys aren't actually kept on the TPM - the running OS stores them on disk. If the OS wants to encrypt or decrypt something, it loads the key into the TPM and asks it to perform the desired operation. The TPM decrypts the key and then goes to work on the data. For small quantities of data, the secret can even be stored in the TPM's nvram rather than on disk.
All of this means that the keys are tied to a system, which is great for security. An attacker can't obtain the decrypted keys, even if they have a keylogger and full access to your filesystem. If I encrypt my laptop's drive and then encrypt the decryption key with the TPM, stealing my drive won't help even if you have my passphrase - any other TPM simply doesn't have the keys necessary to give you access.
That's fine for keys which are system specific, but what about keys that I might want to use on multiple systems, or keys that I want to carry on using when I need to replace my hardware? Keys can optionally be flagged as migratable, which makes it possible to export them from the TPM and import them to another TPM. This seems like it defeats most of the benefits, but there's a couple of features that improve security here. The first is that you need the TPM ownership password, which is something that's set during initial TPM setup and then not usually used afterwards. An attacker would need to obtain this somehow. The other is that you can set limits on migration when you initially import the key. In this scenario the TPM will only be willing to export the key by encrypting it with a pre-configured public key. If the private half is kept offline, an attacker is still unable to obtain a decrypted copy of the key.
So I just replace the OS with one that steals the secret, right?
Say my root filesystem is encrypted with a secret that's stored on the TPM. An attacker can replace my kernel with one that grabs that secret once the TPM's released it. How can I avoid that?
TPMs have a series of Platform Configuration Registers (PCRs) that are used to record system state. These all start off programmed to zero, but applications can extend them at runtime by writing a sha1 hash into them. The new hash is concatenated to the existing PCR value and another sha1 calculated, and then this value is stored in the PCR. The firmware hashes itself and various option ROMs and adds those values to some PCRs, and then grabs the bootloader and hashes that. The bootloader then hashes its configuration and the files it reads before executing them.
This chain of trust means that you can verify that no prior system component has been modified. If an attacker modifies the bootloader then the firmware will calculate a different hash value, and there's no way for the attacker to force that back to the original value. Changing the kernel or the initrd will result in the same problem. Other than replacing the very low level firmware code that controls the root of trust, there's no way an attacker can replace any fundamental system components without changing the hash values.
TPMs support using these hash values to decide whether or not to perform a decryption operation. If an attacker replaces the initrd, the PCRs won't match and the TPM will simply refuse to hand over the secret. You can actually see this in use on Windows devices using Bitlocker - if you do anything that would change the PCR state (like booting into recovery mode), the TPM won't hand over the key and Bitlocker has to prompt for a recovery key. Choosing which PCRs to care about is something of a balancing act. Firmware configuration is typically hashed into PCR 1, so changing any firmware configuration options will change it. If PCR 1 is listed as one of the values that must match in order to release the secret, changing any firmware options will prevent the secret from being released. That's probably overkill. On the other hand, PCR 0 will normally contain the firmware hash itself. Including this means that the user will need to recover after updating their firmware, but failing to include it means that an attacker can subvert the system by replacing the firmware.
What about using TPMs for DRM?
In theory you could populate TPMs with DRM keys for media playback, and seal them such that the hardware wouldn't hand them over. In practice this is probably too easily subverted or too user-hostile - changing default boot order in your firmware would result in validation failing, and permitting that would allow fairly straightforward subverted boot processes. You really need a finer grained policy management approach, and that's something that the TPM itself can't support.
This is where Remote Attestation comes in. Rather than keep any secrets on the local TPM, the TPM can assert to a remote site that the system is in a specific state. The remote site can then make a policy determination based on multiple factors and decide whether or not to hand over session decryption keys. The idea here is fairly straightforward. The remote site sends a nonce and a list of PCRs. The TPM generates a blob with the requested PCR values, sticks the nonce on, encrypts it and sends it back to the remote site. The remote site verifies that the reply was encrypted with an actual TPM key, makes sure that the nonce matches and then makes a policy determination based on the PCR state.
But hold on. How does the remote site know that the reply was encrypted with an actual TPM? When TPMs are built, they have something called an Endorsement Key (EK) flashed into them. The idea is that the only way to have a valid EK is to have a TPM, and that the TPM will never release this key to anything else. There's a couple of problems here. The first is that proving you have a valid EK to a remote site involves having a chain of trust between the EK and some globally trusted third party. Most TPMs don't have this - the only ones I know of that do are recent Infineon and STMicro parts. The second is that TPMs only have a single EK, and so any site performing remote attestation can cross-correlate you with any other site. That's a pretty significant privacy concern.
There's a theoretical solution to the privacy issue. TPMs never actually sign PCR quotes with the EK. Instead, TPMs can generate something called an Attestation Identity Key (AIK) and sign it with the EK. The OS can then provide this to a site called a PrivacyCA, which verifies that the AIK is signed by a real EK (and hence a real TPM). When a third party site requests remote attestation, the TPM signs the PCRs with the AIK and the third party site asks the PrivacyCA whether the AIK is real. You can have as many AIKs as you want, so you can provide each service with a different AIK.
As long as the PrivacyCA only keeps track of whether an AIK is valid and not which EK it was signed with, this avoids the privacy concerns - nobody would be able to tell that multiple AIKs came from the same TPM. On the other hand, it makes any PrivacyCA a pretty attractive target. Compromising one would not only allow you to fake up any remote attestation requests, it would let you violate user privacy expectations by seeing that (say) the TPM being used to attest to HolyScriptureVideos.com was also being used to attest to DegradingPornographyInvolvingAnimals.com.
Perhaps unsurprisingly (given the associated liability concerns), there's no public and trusted PrivacyCAs yet, and even if they were (a) many computers are still being sold without TPMs and (b) even those with TPMs often don't have the EK certificate that would be required to make remote attestation possible. So while remote attestation could theoretically be used to impose DRM in a way that would require you to be running a specific OS, practical concerns make it pretty difficult for anyone to deploy that at any point in the near future.
Is this just limited to early OS components?
Nope. The Linux kernel
has support for measuring each binary run or each module loaded and extending PCRs accordingly. This makes it possible to ensure that the running binaries haven't been modified on disk. There's not a lot of distribution infrastructure for setting this up, but in theory a distribution could deploy an entirely signed userspace and allow the user to opt into only executing correctly signed binaries. Things get more interesting when you add interpreted scripts to the mix, so there's still plenty of work to do there.
So what can I actually use a TPM for?
Drive encryption is probably the best example (Bitlocker does it on Windows, and there's a LUKS-based implementation for Linux
here) - while in theory you could do things like use your TPM as a factor in two-factor authentication or tie your GPG key to it, there's not a lot of existing infrastructure for handling all of that. For the majority of people, the most useful feature of the TPM is probably the random number generator. rngd has support for pulling numbers out of it and stashing them in /dev/random, and it's probably worth doing that unless you have an Ivy Bridge or other CPU with an RNG.
Things get more interesting in more niche cases. Corporations can bind VPN keys to corporate machines, making it possible to impose varying security policies. Intel use the TPM as part of their anti-theft technology on education-oriented devices like the Classmate. And in the cloud, projects like
Trusted Computing Pools use remote attestation to verify that compute nodes are in a known good state before scheduling jobs on them.
Is there a threat to freedom?
At the moment, probably not. The lack of any workable general purpose remote attestation makes it difficult for anyone to impose TPM-based restrictions on users, and any local code is obviously under the user's control - got a program that wants to read the PCR state before letting you do something? LD_PRELOAD something that gives it the desired response, or hack it so it ignores failure. It's just far too easy to circumvent.
Summary?
TPMs are useful for some very domain-specific applications, drive encryption and random number generation. The current state of technology doesn't make them useful for practical limitations of end-user freedom.
[1] Ranging from 8-bit things that are better suited to driving washing machines, up to full ARM cores
[2] "Low Pin Count", basically ISA without the slots.
[3] Loading a key and decrypting a 5 byte payload takes 1.5
seconds on my laptop's TPM.

comments
May 07, 2013 05:18 PM
May 06, 2013
The CFP for the 2013 Linux Security Summit has been announced.
The summit will be held across the 19th and 20th of September in New Orleans, co-located again with LinuxCon and Linux Plumbers. Note that presenters and attendees at LSS must be registered as LinuxCon attendees.
We’ll be following a similar format to last year, with a day of refereed presentations, followed by subsystem updates and break-out sessions on the second day. We’ll probably finish up around lunchtime on the Friday for people needing to head home that day, but check the final schedule for details once it’s published.
The CFP is open until 14th June, with speaker notifications to be posted by 21st June.
If you’ve been doing cool and interesting work in Linux security, be sure to submit a proposal!
May 06, 2013 09:59 AM
May 04, 2013
I published a quick overview on how to do TSX profiling with Linux perf: Intel TSX profiling with Linux perf
This is a technical overview that assumes some prior knowledge of profiling. I apologize for the cumbersome title.
May 04, 2013 02:53 AM
May 03, 2013
| |
17 |
18 |
19 |
rawhide |
|
| Open: |
270 |
345 |
126 |
70 |
(811) |
| Opened since 2013-04-26 |
4 |
24 |
9 |
6 |
(43) |
| Closed since 2013-04-26 |
12 |
28 |
8 |
9 |
(57) |
| Changed since 2013-04-26 |
15 |
52 |
18 |
7 |
(92) |
Weekly Fedora kernel bug statistics – May 03 2013 is a post from: codemonkey.org.uk
May 03, 2013 04:10 PM
April 30, 2013
I've released man-pages-3.51. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.
This is a relatively small release that has various fixes across a number of pages. Among the more notable changes in man-pages-3.51 are the following:
- Documentation of various /proc interfaces was added in a number of pages.
- Various architecture-specific details were added to the syscall(1) and clone(2) pages.
April 30, 2013 07:31 AM
Content copyright by their respective authors.