Adam Litke has uploaded a new change for review.
Change subject: storage: Search only the current image for children ......................................................................
storage: Search only the current image for children
The getChildren method of FileVolume is currently searching all images in the storage domain for children. A glob of all metadata files in the SD is cached and then passed to grep to look for the child volumes. The problem is that deleteVolumes takes an exclusive lock on the image namespace only. If deletes are active on other images at the same time we'll get errors with missing files.
The code mistakenly claims that children can be in any image on the storage domain. It appears that they are in fact limited to the same image. Therefore, the simplest resolution is to limit the search for children to the current image.
Change-Id: I2ef9012cee3a8cb891926510c10ecc47b7cddaa1 Signed-off-by: Adam Litke alitke@redhat.com Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1069610 --- M vdsm/storage/fileVolume.py 1 file changed, 2 insertions(+), 2 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/96/35096/1
diff --git a/vdsm/storage/fileVolume.py b/vdsm/storage/fileVolume.py index a8e9835..8737e39 100644 --- a/vdsm/storage/fileVolume.py +++ b/vdsm/storage/fileVolume.py @@ -361,10 +361,10 @@ def getChildren(self): """ Return children volume UUIDs.
- Children can be found in any image of the volume SD. + Children will always belong to the same image as this volume. """ domPath = self.imagePath.split('images')[0] - metaPattern = os.path.join(domPath, 'images', '*', '*.meta') + metaPattern = os.path.join(domPath, 'images', self.imgUUID, '*.meta') metaPaths = oop.getProcessPool(self.sdUUID).glob.glob(metaPattern) pattern = "%s.*%s" % (volume.PUUID, self.volUUID) matches = grepCmd(pattern, metaPaths)
Adam Litke has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1: Verified+1
Works fine in my tests. Please add other subject matter experts to this so we can make sure I'm not missing a real situation where a child can only be found by searching the metadata files located in another image dir.
oVirt Jenkins CI Server has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/13376/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/12586/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created_staging/152/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/13538/ : FAILURE
Federico Simoncelli has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
For as much as I know this should be correct because with templates we hardlink the metadata in the dependent images.
Can you verify that getVolsOfImage works as expected?
Adam Litke has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
I took a look at the callers of getVolsOfImage and it looks like we'll be protected from volumes disappearing since we're using the Image lock.
Nir Soffer has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1: Code-Review+1
Adam, can you describe how did you verify this? What flows are tested to work with this?
Dan Kenigsberg has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
(1 comment)
http://gerrit.ovirt.org/#/c/35096/1/vdsm/storage/fileVolume.py File vdsm/storage/fileVolume.py:
Line 360: Line 361: def getChildren(self): Line 362: """ Return children volume UUIDs. Line 363: Line 364: Children will always belong to the same image as this volume. Is that true to template volumes, too?
Do we never instantiate a template volume, assuming to produce all dependent entities? Line 365: """ Line 366: domPath = self.imagePath.split('images')[0] Line 367: metaPattern = os.path.join(domPath, 'images', self.imgUUID, '*.meta') Line 368: metaPaths = oop.getProcessPool(self.sdUUID).glob.glob(metaPattern)
Adam Litke has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
(1 comment)
http://gerrit.ovirt.org/#/c/35096/1/vdsm/storage/fileVolume.py File vdsm/storage/fileVolume.py:
Line 360: Line 361: def getChildren(self): Line 362: """ Return children volume UUIDs. Line 363: Line 364: Children will always belong to the same image as this volume.
Is that true to template volumes, too?
That's what StorageDomain.getAllVolumes() is for. The image that contains a template volume must have only that one volume. For images that derive from the template, the template volume is hardlinked into the directory. I've checked the callers of getChildren and none should be confused by a template since either they first check for volume.isShared() or the context only happens for non-template volumes (ie. Live Merge). Line 365: """ Line 366: domPath = self.imagePath.split('images')[0] Line 367: metaPattern = os.path.join(domPath, 'images', self.imgUUID, '*.meta') Line 368: metaPaths = oop.getProcessPool(self.sdUUID).glob.glob(metaPattern)
Federico Simoncelli has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1:
(1 comment)
http://gerrit.ovirt.org/#/c/35096/1//COMMIT_MSG Commit Message:
Line 12: problem is that deleteVolumes takes an exclusive lock on the image Line 13: namespace only. If deletes are active on other images at the same time Line 14: we'll get errors with missing files. Line 15: Line 16: The code mistakenly claims that children can be in any image on the
False. Templates have children _only_ in other images.
Feel free to elaborate on why bz1069610#c3 is wrong.
Having two concurrent deleteImage calls will generate two "grep" processes with cached *.meta files to scan. One of the two calls will fail (No such file or directory) because one .meta file was removed by the other one. Line 17: storage domain. It appears that they are in fact limited to the same Line 18: image. Therefore, the simplest resolution is to limit the search for Line 19: children to the current image. Line 20:
Dan Kenigsberg has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 1: Code-Review+1
(2 comments)
http://gerrit.ovirt.org/#/c/35096/1//COMMIT_MSG Commit Message:
Line 12: problem is that deleteVolumes takes an exclusive lock on the image Line 13: namespace only. If deletes are active on other images at the same time Line 14: we'll get errors with missing files. Line 15: Line 16: The code mistakenly claims that children can be in any image on the
Feel free to elaborate on why bz1069610#c3 is wrong.
Answering for Eduardo:
* getChildren is raceful, but it was raceful before 317b957 * deleteImage does not generate any call to getChildren, as it is done in the storage domain level with sd.getVolsOfImage.
However, I think that parallel mergeSnapshots would show the problem, regardless of the live snapshot and live merge flows. Line 17: storage domain. It appears that they are in fact limited to the same Line 18: image. Therefore, the simplest resolution is to limit the search for Line 19: children to the current image. Line 20:
http://gerrit.ovirt.org/#/c/35096/1/vdsm/storage/fileVolume.py File vdsm/storage/fileVolume.py:
Line 360: Line 361: def getChildren(self): Line 362: """ Return children volume UUIDs. Line 363: Line 364: Children will always belong to the same image as this volume.
False for templates.
Eduardo, we have no intentions to reinroduce getChildren to getVolumeInfo() http://gerrit.ovirt.org/#/c/20004/2/vdsm/storage/volume.py
Adam, would you please change the docstring of this function to match the new semantics?
... for a template's "self-volume", this function returns no children
or maybe even better
... this function should never be called on a template's base volume. Line 365: """ Line 366: domPath = self.imagePath.split('images')[0] Line 367: metaPattern = os.path.join(domPath, 'images', self.imgUUID, '*.meta') Line 368: metaPaths = oop.getProcessPool(self.sdUUID).glob.glob(metaPattern)
Adam Litke has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 2: Verified+1
Dan Kenigsberg has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 2: Code-Review+1
(1 comment)
http://gerrit.ovirt.org/#/c/35096/2//COMMIT_MSG Commit Message:
Line 12: problem is that deleteVolumes takes an exclusive lock on the image Line 13: namespace only. If deletes are active on other images at the same time Line 14: we'll get errors with missing files. Line 15: Line 16: The code mistakenly claims that children can be in any image on the Would you agree to amend the commit message along the lines of:
"The code attempts to report children of a template's base volume. However, this is not needed now and is not expected to be required in the future." Line 17: storage domain. It appears that they are in fact limited to the same Line 18: image. Therefore, the simplest resolution is to limit the search for Line 19: children to the current image. Line 20:
Adam Litke has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 2:
(1 comment)
http://gerrit.ovirt.org/#/c/35096/2//COMMIT_MSG Commit Message:
Line 12: problem is that deleteVolumes takes an exclusive lock on the image Line 13: namespace only. If deletes are active on other images at the same time Line 14: we'll get errors with missing files. Line 15: Line 16: The code mistakenly claims that children can be in any image on the
Would you agree to amend the commit message along the lines of:
Done Line 17: storage domain. It appears that they are in fact limited to the same Line 18: image. Therefore, the simplest resolution is to limit the search for Line 19: children to the current image. Line 20:
Federico Simoncelli has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 3: Code-Review+2
Dan Kenigsberg has submitted this change and it was merged.
Change subject: storage: Search only the current image for children ......................................................................
storage: Search only the current image for children
The getChildren method of FileVolume is currently searching all images in the storage domain for children. A glob of all metadata files in the SD is cached and then passed to grep to look for the child volumes. The problem is that deleteVolumes takes an exclusive lock on the image namespace only. If deletes are active on other images at the same time we'll get errors with missing files.
The code attempts to report the children of a template base volume. However, this is not needed now and is not expected to be required in the future. The getVolsOfImage API should be used for templates. Therefore, we can fix this race by limiting the search for children to a single image.
Change-Id: I2ef9012cee3a8cb891926510c10ecc47b7cddaa1 Signed-off-by: Adam Litke alitke@redhat.com Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1069610 Reviewed-on: http://gerrit.ovirt.org/35096 Reviewed-by: Dan Kenigsberg danken@redhat.com Reviewed-by: Federico Simoncelli fsimonce@redhat.com --- M vdsm/storage/fileVolume.py 1 file changed, 2 insertions(+), 2 deletions(-)
Approvals: Adam Litke: Verified Federico Simoncelli: Looks good to me, approved Dan Kenigsberg: Looks good to me, but someone else must approve
oVirt Jenkins CI Server has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 2:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/13808/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/13019/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/13971/ : FAILURE
oVirt Jenkins CI Server has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 3:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/13811/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/13022/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/13974/ : ABORTED
oVirt Jenkins CI Server has posted comments on this change.
Change subject: storage: Search only the current image for children ......................................................................
Patch Set 4:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master-libfapi_create-rpms-el6-x86_64_merg... : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_create-rpms-fc21-x86_64_merged/256/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master-libfapi_create-rpms-fc20-x86_64_mer... : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_create-rpms_merged_test_debug/473/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master-libfapi_create-rpms-el7-x86_64_merg... : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_create-rpms-fc20-x86_64_merged/271/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_merged/4265/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_create-rpms-el6-x86_64_merged/278/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_create-rpms-el7-x86_64_merged/280/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master-libfapi_create-rpms-fc21-x86_64_mer... : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_verify-error-codes_merged/6103/ : SUCCESS
vdsm-patches@lists.fedorahosted.org