Francesco Romani has uploaded a new change for review.
Change subject: vm: split slow disk access from normalization ......................................................................
vm: split slow disk access from normalization
split utp potentially slow disk access from fast disk normalization.
Change-Id: I6d8cb7a3c58e37540e56f4359aa67ea4dce875cc Signed-off-by: Francesco Romani fromani@redhat.com --- M vdsm/virt/vm.py 1 file changed, 19 insertions(+), 11 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/52/34752/1
diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 9740ab3..05b15e9 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -1337,6 +1337,10 @@ if 'device' not in drv: drv['device'] = 'disk'
+ drv['truesize'] = 0 + drv['apparentsize'] = 0 + + def _getVdsmImgSizes(self, drv): if drv['device'] == 'disk': res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'], drv['imageID'], drv['volumeID']) @@ -1348,9 +1352,6 @@ # if a key is missing here, is hsm bug and we cannot handle it. drv['truesize'] = res['truesize'] drv['apparentsize'] = res['apparentsize'] - else: - drv['truesize'] = 0 - drv['apparentsize'] = 0
def __legacyDrives(self): """ @@ -1419,14 +1420,7 @@ # Normalize vdsm images for drv in devices[DISK_DEVICES]: if isVdsmImage(drv): - try: - self._normalizeVdsmImg(drv) - except StorageUnavailableError: - # storage unavailable is not fatal on recovery; - # the storage subsystem monitors the devices - # and will notify when they come up later. - if not self.recovering: - raise + self._normalizeVdsmImg(drv)
self.normalizeDrivesIndices(devices[DISK_DEVICES])
@@ -1435,6 +1429,18 @@ self._normalizeBalloonDevice(devices[BALLOON_DEVICES])
return devices + + def _updateDiskSizes(self, drives): + for drv in drives: + if isVdsmImage(drv): + try: + self._getVdsmImgSizes(drv) + except StorageUnavailableError: + # storage unavailable is not fatal on recovery; + # the storage subsystem monitors the devices + # and will notify when they come up later. + if not self.recovering: + raise
def _normalizeBalloonDevice(self, balloonDevices): EMPTY_BALLOON = {'type': BALLOON_DEVICES, @@ -2640,6 +2646,7 @@ def _run(self): self.log.info("VM wrapper has started") devices = self.buildConfDevices() + self._updateDiskSizes(devices[DISK_DEVICES])
# recovery flow note: # we do not start disk stats collection here since @@ -3330,6 +3337,7 @@
if isVdsmImage(diskParams): self._normalizeVdsmImg(diskParams) + self._getVdsmImgSizes(diskParams) self._createTransientDisk(diskParams)
self.updateDriveIndex(diskParams)
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Patch Set 1:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/13333/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/13173/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/12383/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_virt_functional_tests_gerrit/1883/ : There was an infra issue, please contact infra@ovirt.org
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Patch Set 2:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/13344/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/13184/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/12394/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_virt_functional_tests_gerrit/1887/ : There was an infra issue, please contact infra@ovirt.org
Nir Soffer has posted comments on this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Patch Set 2: Code-Review-1
(1 comment)
http://gerrit.ovirt.org/#/c/34752/2/vdsm/virt/vm.py File vdsm/virt/vm.py:
Line 1337: if 'device' not in drv: Line 1338: drv['device'] = 'disk' Line 1339: Line 1340: drv['truesize'] = 0 Line 1341: drv['apparentsize'] = 0 This is bad idea - this creates an unwanted state where disk size is set to 0, before the we call _getVdsmImgSizes. Another thread may access the wrong size and do the wrong thing based on this information.
We should have only 2 states:
1. We don't know what the disk size because we did not check yet 2. We know the size Line 1342: Line 1343: def _getVdsmImgSizes(self, drv): Line 1344: if drv['device'] == 'disk': Line 1345: res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'],
Nir Soffer has posted comments on this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Patch Set 2:
(2 comments)
http://gerrit.ovirt.org/#/c/34752/2/vdsm/virt/vm.py File vdsm/virt/vm.py:
Line 1339: Line 1340: drv['truesize'] = 0 Line 1341: drv['apparentsize'] = 0 Line 1342: Line 1343: def _getVdsmImgSizes(self, drv): This name is also confusing, as this does not return anything. It should be something like _updateVdsmImageSize.
And using shortcuts such as "Img" in the name does not help help and is uncommon in vdsm. Line 1344: if drv['device'] == 'disk': Line 1345: res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'], Line 1346: drv['imageID'], drv['volumeID']) Line 1347: if res['status']['code'] != 0:
Line 2645: Line 2646: def _run(self): Line 2647: self.log.info("VM wrapper has started") Line 2648: devices = self.buildConfDevices() Line 2649: self._updateDiskSizes(devices[DISK_DEVICES]) This is a good change - it is more clear that we do this step here compared with the current code that hide this operation inside buildConfDevices.
But since we want to split _run to multiple actions (start vm, recover vm, ...), moving this operation into buildConfDevices() is more useful.
I think that removing the bogus size keys (set to 0 because we did not check yet) when normalizing vdsm image and adding them only when updating the size is the correct change. Line 2650: Line 2651: # recovery flow note: Line 2652: # we do not start disk stats collection here since Line 2653: # in the recovery flow irs may not be ready yet.
Francesco Romani has posted comments on this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Patch Set 2:
(3 comments)
this change was the preparation step for the next one (34753). But if, as it now seems clear, we can't split the device normalization (make sure the device config has all the correct fields) from the device initialization (filling the required fields with up to date values), then this is the wrong direction.
http://gerrit.ovirt.org/#/c/34752/2/vdsm/virt/vm.py File vdsm/virt/vm.py:
Line 1337: if 'device' not in drv: Line 1338: drv['device'] = 'disk' Line 1339: Line 1340: drv['truesize'] = 0 Line 1341: drv['apparentsize'] = 0
This is bad idea - this creates an unwanted state where disk size is set to
Good catch, this is a point I definitely overlooked. Please see below for the full context. Line 1342: Line 1343: def _getVdsmImgSizes(self, drv): Line 1344: if drv['device'] == 'disk': Line 1345: res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'],
Line 1339: Line 1340: drv['truesize'] = 0 Line 1341: drv['apparentsize'] = 0 Line 1342: Line 1343: def _getVdsmImgSizes(self, drv):
This name is also confusing, as this does not return anything. It should be
Right. Line 1344: if drv['device'] == 'disk': Line 1345: res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'], Line 1346: drv['imageID'], drv['volumeID']) Line 1347: if res['status']['code'] != 0:
Line 2645: Line 2646: def _run(self): Line 2647: self.log.info("VM wrapper has started") Line 2648: devices = self.buildConfDevices() Line 2649: self._updateDiskSizes(devices[DISK_DEVICES])
This is a good change - it is more clear that we do this step here compare
This is a very good point, but this unfortunately also means that this change of mine doesn't really solve the problem I'm tackling.
In the following patch, I'm trying to solve https://bugzilla.redhat.com/show_bug.cgi?id=1143968 by building on this change.
The problem I observed is the Drive configuration is mutating, and this is the cause of RuntimeError I reported. That's the reason why I added the bogus keys.
But bogus keys are worse than no keys, and you're right about the states we should export (either no data or correct data).
So I'm back to the drawing board :) Line 2650: Line 2651: # recovery flow note: Line 2652: # we do not start disk stats collection here since Line 2653: # in the recovery flow irs may not be ready yet.
Francesco Romani has abandoned this change.
Change subject: vm: split slow disk access from normalization ......................................................................
Abandoned
wrong approach, fixed otherwise.
vdsm-patches@lists.fedorahosted.org