On Fedora 18 (and every Unix since forever):
$ touch foo $ sudo chown root.root foo $ ln foo bar
On Fedora 19 & Rawhide:
$ touch foo $ sudo chown root.root foo $ ln foo bar ln: failed to create hard link ‘bar’ => ‘foo’: Operation not permitted
Why?
Rich.
Once upon a time, Richard W.M. Jones rjones@redhat.com said:
Why?
http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...
On Mon, Jul 15, 2013 at 12:35:58PM -0500, Chris Adams wrote:
Once upon a time, Richard W.M. Jones rjones@redhat.com said:
Why?
http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...
The description there ["follows a link belonging to another user"] doesn't describe the change [preventing the link being created in the first place].
Rich.
On 15.07.2013 19:35, Chris Adams wrote:
Once upon a time, Richard W.M. Jones rjones@redhat.com said:
Why?
http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...
There is a BUG it the documentation or in my Fedora 19 system!
This lines mentioned in documentation:
fs.protected_hardlinks = 1 fs.protected_symlinks = 1
are in /usr/lib/sysctl.d/50-default.conf file and not in /usr/lib/sysctl.d/00-system.conf file as it's written. Is someone checking this before releasing invalid documentation or this new feature was so hot and finally landed in different file than it was initially designed?
Mateusz Marzantowicz
On Mon, Jul 15, 2013 at 06:32:50PM +0100, Richard W.M. Jones wrote:
On Fedora 18 (and every Unix since forever):
$ touch foo $ sudo chown root.root foo $ ln foo bar
On Fedora 19 & Rawhide:
$ touch foo $ sudo chown root.root foo $ ln foo bar ln: failed to create hard link ‘bar’ => ‘foo’: Operation not permitted
Why?
OK apparently:
http://danwalsh.livejournal.com/64493.html
So we're not following upstream here? (or rather, we're following upstream but cheating by overriding upstream preferences).
Rich.
On 07/15/2013 07:32 PM, Richard W.M. Jones wrote:
Why?
Without it, it's possible to exploit certain weaknesses to make /etc/shadow word-readable or worse, for example.
Hard links are fundamentally incompatible with the way we run SELinux, and this change mitigates that issue to some extent.
On Tue, Jul 16, 2013 at 10:42:10AM +0200, Florian Weimer wrote:
On 07/15/2013 07:32 PM, Richard W.M. Jones wrote:
Why?
Without it, it's possible to exploit certain weaknesses to make /etc/shadow word-readable or worse, for example.
Hard links are fundamentally incompatible with the way we run SELinux, and this change mitigates that issue to some extent.
Any more information on this.
FWIW this change caused a segfault in OpenStack (now fixed, but there's a larger problem remaining - RHBZ#983218).
Rich.
On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:
FWIW this change caused a segfault in OpenStack
This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.
Really it had nothing to do with hard links at all - the same symptom could have happened on ENOSPC for write().
On 07/16/2013 12:41 PM, Colin Walters wrote:
On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:
FWIW this change caused a segfault in OpenStack
This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.
Or: uncovered a pre-existing bug that could then be addressed..
Not sure how that is a bad thing.
Really it had nothing to do with hard links at all - the same symptom could have happened on ENOSPC for write().
Quite.
Bryn.
On Tue, Jul 16, 2013 at 01:07:40PM +0100, Bryn M. Reeves wrote:
On 07/16/2013 12:41 PM, Colin Walters wrote:
On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:
FWIW this change caused a segfault in OpenStack
This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.
Or: uncovered a pre-existing bug that could then be addressed..
Not sure how that is a bad thing.
Changed the kernel interface, exposing a bug, but it certainly did cause a segfault.
The other part of this is we're going to have to rewrite some perfectly working code to fit this new assumption. I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.
Rich.
On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:
I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.
Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().
On Tue, Jul 16, 2013 at 11:50:10AM -0400, Colin Walters wrote:
On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:
I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.
Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().
Thanks for any input. It's this code below. The big comment at the top explains roughly what's going on, but misses out a crucial detail: If you are using libvirt in a certain way, then libvirtd [which runs as root] will chown the 'kernel' and 'initrd' files to root.root. If that chown didn't happen, then none of this would be a problem.
https://github.com/libguestfs/libguestfs/blob/master/src/appliance.c#L68
Rich.
On Tue, Jul 16, 2013 at 05:42:00PM +0100, Richard W.M. Jones wrote:
On Tue, Jul 16, 2013 at 11:50:10AM -0400, Colin Walters wrote:
On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:
I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.
Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().
Thanks for any input. It's this code below. The big comment at the top explains roughly what's going on, but misses out a crucial detail: If you are using libvirt in a certain way, then libvirtd [which runs as root] will chown the 'kernel' and 'initrd' files to root.root. If that chown didn't happen, then none of this would be a problem.
https://github.com/libguestfs/libguestfs/blob/master/src/appliance.c#L68
I'll just add a bit about how the locking works in this code since it may not be obvious.
There's a lock (building_lock) which stops two threads from the same process from entering the appliance building code in parallel.
There's also a lock (fcntl held on the file 'checksums') which stops two processes owned by the same user from trying to rebuild the appliance in parallel.
And different users have their own directory ($UID == euid is different), so those don't interfere.
BUT after we've built the appliance, we have to pass it over (by reference to the filename) to qemu. Now qemu runs in its own sweet time a bit later on, and we can never know when qemu will actually read these files (and because qemu may run for an indefinitely long period of time, we can't block other processes here). That's where the hard link comes in: it gives qemu effectively its own copy of the kernel, initrd and root files, while letting us safely erase [the hard-linked alias] from another process.
These files are huge so copying them isn't an alternative.
Rich.
On Tue, 2013-07-16 at 17:59 +0100, Richard W.M. Jones wrote:
There's a lock (building_lock) which stops two threads from the same process from entering the appliance building code in parallel.
There's also a lock (fcntl held on the file 'checksums') which stops two processes owned by the same user from trying to rebuild the appliance in parallel.
What we're trying to achieve with this is that if I do e.g. guestmount from two separate shells, only one of them will create the appliance?
That makes sense.
And different users have their own directory ($UID == euid is different), so those don't interfere.
Though there's the usual DoS problem from predicably-named directories in /{,var/}tmp. Consider /run/user/$UID/libguestfs-cache/ being a symbolic link to a randomly named /var/tmp/libguestfs.XXXXXX.
BUT after we've built the appliance, we have to pass it over (by reference to the filename) to qemu. Now qemu runs in its own sweet time a bit later on, and we can never know when qemu will actually read these files (and because qemu may run for an indefinitely long period of time, we can't block other processes here). That's where the hard link comes in: it gives qemu effectively its own copy of the kernel, initrd and root files, while letting us safely erase [the hard-linked alias] from another process.
qemu *could* export some signaling interface to the effect of "I have called open() on all file arguments specified on the command line". That seems like it'd be generally useful.
But ok, so as you said in the followup, the real root issue here is libvirt calling chown. It's not clear to me why it does so. It seems fine if a user suitably privileged to create VMs can crash them afterwards by writing to or unlinking the kernel/initramfs.
On Tue, Jul 16, 2013 at 01:37:33PM -0400, Colin Walters wrote:
But ok, so as you said in the followup, the real root issue here is libvirt calling chown [on the kernel and initrd]. It's not clear to me why it does so.
It's a good question, and trying to formulate an answer made me question some fundamental assumptions. I'm CC-ing libvir-list to see if anyone has some ideas.
----------------------------------------
Some background: We've got three users involved here.
nova libvirtd qemu -- invokes --> -- invokes --> some $UID root qemu.qemu
The reason for this dance of three users is security. It provides extra separation in case a rogue filesystem that we're examining exploits the guest kernel and qemu (which is reckoned to be rather easy). We don't want that rogue qemu process to be able to escalate this to an attack on either the host or the process (nova) running libguestfs.
SELinux is used too for extra extra security, but you may be amazed that it's not unknown for people to turn SELinux off.
----------------------------------------
Now does libvirtd need to chown kernel & initrd?
Ideally (and something that's slowly being worked towards) libvirtd would open all resources on qemu's behalf, and pass opened file descriptors down to qemu. In the case of kernel and initrd libvirtd could do this already by passing /dev/fd/<N> paths to qemu. This, I think, would avoid any need to chown those.
For the appliance disk, /dev/fd/<N> also works, but only because we're not hot-plugging this disk. (When you have to hot-plug a disk, you are talking to qemu over the qemu monitor, so the only way to pass fd's over is to use SCM_RIGHTS).
Could we just rely on Unix permissions? I think the answer is yes for kernel & initrd (make them world readable), at least in this case. If kernel & initrd had private data, you wouldn't want to do this. The appliance disk is a little more tricky.
Rich.