Hard link to root-owned file now fails (since Fedora 19)

List overview All Threads
Download

newer

older

no mention of interface naming in...

F20 System Wide Change: SSD cache

Richard W.M. Jones

15 Jul 2013 15 Jul '13

10:32 a.m.

On Fedora 18 (and every Unix since forever):

$ touch foo $ sudo chown root.root foo $ ln foo bar

On Fedora 19 & Rawhide:

$ touch foo $ sudo chown root.root foo $ ln foo bar ln: failed to create hard link ‘bar’ => ‘foo’: Operation not permitted

Why?

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

Show replies by date

Chris Adams

15 Jul 15 Jul

10:35 a.m.

Once upon a time, Richard W.M. Jones rjones@redhat.com said:

...

Why?

http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...

-- Chris Adams linux@cmadams.net

Richard W.M. Jones

10:41 a.m.

On Mon, Jul 15, 2013 at 12:35:58PM -0500, Chris Adams wrote:

...

Once upon a time, Richard W.M. Jones rjones@redhat.com said:

...
Why?

http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...

The description there ["follows a link belonging to another user"] doesn't describe the change [preventing the link being created in the first place].

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top

Mateusz Marzantowicz

16 Jul 16 Jul

11:58 a.m.

On 15.07.2013 19:35, Chris Adams wrote:

...

Once upon a time, Richard W.M. Jones rjones@redhat.com said:

...
Why?

http://docs.fedoraproject.org/en-US/Fedora/19/html/Release_Notes/sect-Releas...

There is a BUG it the documentation or in my Fedora 19 system!

This lines mentioned in documentation:

fs.protected_hardlinks = 1 fs.protected_symlinks = 1

are in /usr/lib/sysctl.d/50-default.conf file and not in /usr/lib/sysctl.d/00-system.conf file as it's written. Is someone checking this before releasing invalid documentation or this new feature was so hot and finally landed in different file than it was initially designed?

Mateusz Marzantowicz

Richard W.M. Jones

15 Jul 15 Jul

10:36 a.m.

On Mon, Jul 15, 2013 at 06:32:50PM +0100, Richard W.M. Jones wrote:

...

On Fedora 18 (and every Unix since forever):

$ touch foo $ sudo chown root.root foo $ ln foo bar

On Fedora 19 & Rawhide:

$ touch foo $ sudo chown root.root foo $ ln foo bar ln: failed to create hard link ‘bar’ => ‘foo’: Operation not permitted

Why?

OK apparently:

http://danwalsh.livejournal.com/64493.html

So we're not following upstream here? (or rather, we're following upstream but cheating by overriding upstream preferences).

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

Florian Weimer

16 Jul 16 Jul

1:42 a.m.

On 07/15/2013 07:32 PM, Richard W.M. Jones wrote:

...

Why?

Without it, it's possible to exploit certain weaknesses to make /etc/shadow word-readable or worse, for example.

Hard links are fundamentally incompatible with the way we run SELinux, and this change mitigates that issue to some extent.

-- Florian Weimer / Red Hat Product Security Team

Richard W.M. Jones

2:42 a.m.

On Tue, Jul 16, 2013 at 10:42:10AM +0200, Florian Weimer wrote:

...

On 07/15/2013 07:32 PM, Richard W.M. Jones wrote:

...
Why?

Without it, it's possible to exploit certain weaknesses to make /etc/shadow word-readable or worse, for example.

Hard links are fundamentally incompatible with the way we run SELinux, and this change mitigates that issue to some extent.

Any more information on this.

FWIW this change caused a segfault in OpenStack (now fixed, but there's a larger problem remaining - RHBZ#983218).

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

Colin Walters

4:41 a.m.

On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:

...

FWIW this change caused a segfault in OpenStack

This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.

Really it had nothing to do with hard links at all - the same symptom could have happened on ENOSPC for write().

Bryn M. Reeves

5:07 a.m.

On 07/16/2013 12:41 PM, Colin Walters wrote:

...

On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:

...
FWIW this change caused a segfault in OpenStack

This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.

Or: uncovered a pre-existing bug that could then be addressed..

Not sure how that is a bad thing.

...

Really it had nothing to do with hard links at all - the same symptom could have happened on ENOSPC for write().

Quite.

Bryn.

Richard W.M. Jones

7:59 a.m.

On Tue, Jul 16, 2013 at 01:07:40PM +0100, Bryn M. Reeves wrote:

...

On 07/16/2013 12:41 PM, Colin Walters wrote:

...
On Tue, 2013-07-16 at 10:42 +0100, Richard W.M. Jones wrote:

...
FWIW this change caused a segfault in OpenStack

This phrase is very dramatic. I'd say "triggered a double free in an untested libguestfs error path" is more accurate and less dramatic.

Or: uncovered a pre-existing bug that could then be addressed..

Not sure how that is a bad thing.

Changed the kernel interface, exposing a bug, but it certainly did cause a segfault.

The other part of this is we're going to have to rewrite some perfectly working code to fit this new assumption. I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

Colin Walters

8:50 a.m.

On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:

...

I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.

Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().

Richard W.M. Jones

9:42 a.m.

On Tue, Jul 16, 2013 at 11:50:10AM -0400, Colin Walters wrote:

...

On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:

...
I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.

Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().

Thanks for any input. It's this code below. The big comment at the top explains roughly what's going on, but misses out a crucial detail: If you are using libvirt in a certain way, then libvirtd [which runs as root] will chown the 'kernel' and 'initrd' files to root.root. If that chown didn't happen, then none of this would be a problem.

https://github.com/libguestfs/libguestfs/blob/master/src/appliance.c#L68

Rich.

Richard W.M. Jones

9:59 a.m.

On Tue, Jul 16, 2013 at 05:42:00PM +0100, Richard W.M. Jones wrote:

...

On Tue, Jul 16, 2013 at 11:50:10AM -0400, Colin Walters wrote:

...
On Tue, 2013-07-16 at 15:59 +0100, Richard W.M. Jones wrote:

...
I'm not even sure how to do that because we want the atomic behaviour of hard links, and we want to have qemu running as a different user (for security, oh the irony), so there's no other obvious way to code it.

Can you link to or describe the relevant algorithm/code? The file https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py#... doesn't contain an obvious call to link().

Thanks for any input. It's this code below. The big comment at the top explains roughly what's going on, but misses out a crucial detail: If you are using libvirt in a certain way, then libvirtd [which runs as root] will chown the 'kernel' and 'initrd' files to root.root. If that chown didn't happen, then none of this would be a problem.

https://github.com/libguestfs/libguestfs/blob/master/src/appliance.c#L68

I'll just add a bit about how the locking works in this code since it may not be obvious.

There's a lock (building_lock) which stops two threads from the same process from entering the appliance building code in parallel.

There's also a lock (fcntl held on the file 'checksums') which stops two processes owned by the same user from trying to rebuild the appliance in parallel.

And different users have their own directory ($UID == euid is different), so those don't interfere.

BUT after we've built the appliance, we have to pass it over (by reference to the filename) to qemu. Now qemu runs in its own sweet time a bit later on, and we can never know when qemu will actually read these files (and because qemu may run for an indefinitely long period of time, we can't block other processes here). That's where the hard link comes in: it gives qemu effectively its own copy of the kernel, initrd and root files, while letting us safely erase [the hard-linked alias] from another process.

These files are huge so copying them isn't an alternative.

Rich.

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v

Colin Walters

10:37 a.m.

On Tue, 2013-07-16 at 17:59 +0100, Richard W.M. Jones wrote:

...

There's a lock (building_lock) which stops two threads from the same process from entering the appliance building code in parallel.

There's also a lock (fcntl held on the file 'checksums') which stops two processes owned by the same user from trying to rebuild the appliance in parallel.

What we're trying to achieve with this is that if I do e.g. guestmount from two separate shells, only one of them will create the appliance?

That makes sense.

...

And different users have their own directory ($UID == euid is different), so those don't interfere.

Though there's the usual DoS problem from predicably-named directories in /{,var/}tmp. Consider /run/user/$UID/libguestfs-cache/ being a symbolic link to a randomly named /var/tmp/libguestfs.XXXXXX.

...

BUT after we've built the appliance, we have to pass it over (by reference to the filename) to qemu. Now qemu runs in its own sweet time a bit later on, and we can never know when qemu will actually read these files (and because qemu may run for an indefinitely long period of time, we can't block other processes here). That's where the hard link comes in: it gives qemu effectively its own copy of the kernel, initrd and root files, while letting us safely erase [the hard-linked alias] from another process.

qemu *could* export some signaling interface to the effect of "I have called open() on all file arguments specified on the command line". That seems like it'd be generally useful.

But ok, so as you said in the followup, the real root issue here is libvirt calling chown. It's not clear to me why it does so. It seems fine if a user suitably privileged to create VMs can crash them afterwards by writing to or unlinking the kernel/initramfs.

Richard W.M. Jones

11:45 a.m.

On Tue, Jul 16, 2013 at 01:37:33PM -0400, Colin Walters wrote:

...

But ok, so as you said in the followup, the real root issue here is libvirt calling chown [on the kernel and initrd]. It's not clear to me why it does so.

It's a good question, and trying to formulate an answer made me question some fundamental assumptions. I'm CC-ing libvir-list to see if anyone has some ideas.

----------------------------------------

Some background: We've got three users involved here.

nova libvirtd qemu -- invokes --> -- invokes --> some $UID root qemu.qemu

The reason for this dance of three users is security. It provides extra separation in case a rogue filesystem that we're examining exploits the guest kernel and qemu (which is reckoned to be rather easy). We don't want that rogue qemu process to be able to escalate this to an attack on either the host or the process (nova) running libguestfs.

SELinux is used too for extra extra security, but you may be amazed that it's not unknown for people to turn SELinux off.

----------------------------------------

Now does libvirtd need to chown kernel & initrd?

Ideally (and something that's slowly being worked towards) libvirtd would open all resources on qemu's behalf, and pass opened file descriptors down to qemu. In the case of kernel and initrd libvirtd could do this already by passing /dev/fd/<N> paths to qemu. This, I think, would avoid any need to chown those.

For the appliance disk, /dev/fd/<N> also works, but only because we're not hot-plugging this disk. (When you have to hot-plug a disk, you are talking to qemu over the qemu monitor, so the only way to pass fd's over is to use SCM_RIGHTS).

Could we just rely on Unix permissions? I think the answer is yes for kernel & initrd (make them world readable), at least in this case. If kernel & initrd had private data, you wouldn't want to do this. The appliance disk is a little more tricky.

Rich.

4001

Age (days ago)

4002

Last active (days ago)

devel@lists.fedoraproject.org

14 comments

6 participants

tags (0)

participants (6)

Bryn M. Reeves
Chris Adams
Colin Walters
Florian Weimer
Mateusz Marzantowicz
Richard W.M. Jones