On Wed, Aug 10, 2011 at 11:29 PM, Bruno Wolff III <bruno(a)wolff.to> wrote:
On Wed, Aug 10, 2011 at 16:35:23 -0500,
Ed Sutton <ESutton(a)fescorp.com> wrote:
> After every livecd-creator build I must restart and reboot my CentOS 5.2 machine
using the fsck option to fix disk issues I do not understand. I tried a work-around
without success for a similar sounding problem:
>
> Bug 509427 - livecd-creator fails to unmount
>
https://bugzilla.redhat.com/show_bug.cgi?id=509427
>
> Any suggestions on troubleshooting are much appreciated.
The devices get umount'd in the wrong order. In later versions of
livecd-creator we changed things so that lazy umounts are used.
I think there is still a place where an exception can occur where things
don't get cleaned up nicely, but for the most part exceptions don't
leave lots of stuff mounted these days.
I am not sure how well livecd-creator for recent Fedoras will work on CentOS 5.
It's not just centos, I've been watching umount failures with
livecd-tools-16.3-1.fc16.x86_64.
First it was failing when unmounting bind-mounts with "block devices
not permitted on fs" message - that error means EACCES:
http://git.kernel.org/?p=utils/util-linux/util-linux.git;a=blob;f=mount/u...
Adding 2s sleep seems to paper-over the issue, didn't get _that_
failure after the following patch:
--- fs.py.ORIG 2011-03-31 19:53:44.000000000 -0400
+++ fs.py 2011-08-10 12:19:31.219339890 -0400
@@ -142,6 +142,9 @@
if not self.mounted:
return
+ # sleep to try to avoid umount shenanigans
+ # e.g. umount:
XXX/imgcreate-3OdaNp/install_root//var/cache/yum: block devices not
permitted on fs
+ time.sleep(2)
rc = call(["/bin/umount", self.dest])
if rc != 0:
logging.info("Unable to unmount %s normally, using lazy
unmount" % self.dest)
After that I saw sporadic failures when removing loop device:
Losetup remove /dev/loop14
loop: can't delete device /dev/loop14: Device or resource busy
followed by fsck failure on ext3fs.img
I tried to see what's holding the loop device, but after the following
patch I can't reproduce any more, probably it adds enough delay to
avoid the issue:
--- fs.py.ORIG 2011-03-31 19:53:44.000000000 -0400
+++ fs.py 2011-08-10 12:19:31.219339890 -0400
@@ -320,6 +323,8 @@
if self.device is None:
return
logging.info("Losetup remove %s" % self.device)
+ rc = call(["fuser", "-m", self.device])
+ logging.info("fuser rc=%s" % rc)
rc = call(["/sbin/losetup", "-d", self.device])
self.device = None
@@ -389,6 +394,7 @@
def cleanup(self):
Mount.cleanup(self)
+ logging.info("Mount.cleaup done")
self.disk.cleanup()
def unmount(self):
@@ -396,6 +402,7 @@
logging.info("Unmounting directory %s" % self.mountdir)
rc = call(["/bin/umount", self.mountdir])
if rc == 0:
+ logging.info("umount rc=0, ismount=%s" %
os.path.ismount(self.mountdir))
self.mounted = False
else:
logging.warn("Unmounting directory %s failed, using
lazy umount" % self.mountdir)
We have a continuous livecd build running, so I'll keep watching for
this issue.
I wonder if anyone else has hit this issue on f15/16 ?
Alan