Yaniv Bronhaim has uploaded a new change for review.
Change subject: supervdsmServer is down after failed operation (#851832) ......................................................................
supervdsmServer is down after failed operation (#851832)
https://bugzilla.redhat.com/show_bug.cgi?id=851832
After running operation that throws exception that we don't catch in supervdsmServer, we catch it in ProxyCaller::__call__ method. In this except code we reset supervdsmServer and call the same method again. If the exception is thrown again, we leave supervdsmServer down And this is how supervdsmServer remains. This patch omit the recall and leave the reset to keep on normal flow.
Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Signed-off-by: Yaniv Bronhaim ybronhei@redhat.com --- M vdsm/supervdsm.py 1 file changed, 0 insertions(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/01/7901/1
diff --git a/vdsm/supervdsm.py b/vdsm/supervdsm.py index 79e8d81..75e93e2 100644 --- a/vdsm/supervdsm.py +++ b/vdsm/supervdsm.py @@ -67,7 +67,6 @@ return callMethod() except (IOError, socket.error, AuthenticationError): self._supervdsmProxy._restartSupervdsm() - return callMethod()
class SuperVdsmProxy(object):
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: supervdsmServer is down after failed operation (#846307) ......................................................................
Patch Set 2: I would prefer that you didn't submit this
(3 inline comments)
Please add a unit test that shows that supervdsm handles IOError-raising functions properly.
.................................................... Commit Message Line 5: CommitDate: 2012-09-10 14:32:31 +0300 Line 6: Line 7: supervdsmServer is down after failed operation (#846307) Line 8: Line 9: https://bugzilla.redhat.com/show_bug.cgi?id=846307 please add the Bug-Id: tag in front. No need to mention the BZ# in the title, too. Line 10: Line 11: After running operation that throws exception that we don't catch in Line 12: supervdsmServer, we catch it in ProxyCaller::__call__ method. Line 13: In this except code we reset supervdsmServer and call the same method
.................................................... File vdsm/supervdsm.py Line 64: getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): the root problem is not touched. If the called method raises an IOError, it is *not* a good reason for supervdsm to die. The process should be restarted only when the supervdsm framework has a bug, not on bugs in its payload. Line 69: self._supervdsmProxy._restartSupervdsm() Line 70: Line 71: Line 72: class SuperVdsmProxy(object):
Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): Line 69: self._supervdsmProxy._restartSupervdsm() silently returning None is a very bad idea in case of an error. Raising an exception would be more reasonable. Line 70: Line 71: Line 72: class SuperVdsmProxy(object): Line 73: """
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 2 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: supervdsmServer is down after failed operation (#846307) ......................................................................
Patch Set 2: Verified
(2 inline comments)
I can't add ut for this specific change. This change is not about a specific io error, its about handling crashes of supervdsmServer. If you have good idea for such tests please tell me.. I thought about adding dummy method in supervdsmServer that raises an error, and the ut will call it and verify that supervdsmServer is still alive afterwards..
.................................................... File vdsm/supervdsm.py Line 64: getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): The IOError was fixed here: http://gerrit.ovirt.org/#/c/7214/6 .
This patch is here to handle such bugs in supervdsmServer without disabling its service Line 69: self._supervdsmProxy._restartSupervdsm() Line 70: Line 71: Line 72: class SuperVdsmProxy(object):
Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): Line 69: self._supervdsmProxy._restartSupervdsm() We need to handle the error inside supervdsmServer.
We can raise an exception in case of a bug (as you said..) , i prefer to add error log and move on. Line 70: Line 71: Line 72: class SuperVdsmProxy(object): Line 73: """
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 2 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 2: (2 inline comments)
.................................................... File vdsm/supervdsm.py Line 64: getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): this is only one example of one possible exception raised inside supervdsm. this commit should handle all possible errors, and differentiate between those originating in a method to those originating in the framework.
payload exceptions should *not* cause supervdsm restart. Line 69: self._supervdsmProxy._restartSupervdsm() Line 70: Line 71: Line 72: class SuperVdsmProxy(object):
Line 65: **kwargs) Line 66: try: Line 67: return callMethod() Line 68: except (IOError, socket.error, AuthenticationError): Line 69: self._supervdsmProxy._restartSupervdsm() This cannot be the responsibility of the supervdsm framework. You don't know who calls the method, and what is he going to do with the None you return to it. If there is an error, you raise and exception. Line 70: Line 71: Line 72: class SuperVdsmProxy(object): Line 73: """
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 2 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 3: I would prefer that you didn't submit this
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 3 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Shu Ming has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 3: I would prefer that you didn't submit this
In general, the exception is swallowed in __call__ in this fix, I think we should throw the exception out. Then the up layer can get a chance to decide if it will call the method again.
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 3 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 3:
In general every method in _SuperVdsm can throw exception that we don't handle. As I see we have 2 options:
1. In _SuperVdsm we can catch every exception, report it and treat it specifically (In each method provides by superVdsmServer proxy) - but this can cause crashes if we'll add method to superVdsmServer without exceptions handling..
2. We can add general activity, as it is now, and handle the exception in ProxyCaller by report it, reset superVdsmServer and then wait for the user to active the action again after distinguish why it didn't work (as in this patch). Or we can call the method again and pray it will work. And than, if it still doesn't, we can die (as before this patch)..
We can work specifically about option1 and check every supervdsmServer method and its exception handling, but this patch comes to solve non-handled cases.
And by the way, if we have different exception than IOError, socket.error, AuthenticationError - supervdsmServer just dies quietly
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 3 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 5: I would prefer that you didn't submit this
(1 inline comment)
.................................................... File vdsm/supervdsm.py Line 64 Line 65 Line 66 Line 67 Line 68 if you added here
_log.error(something, exc_info=True)
and replaced the "return" with "raise" you'd have a clearer code - but still not address the bug. We should not restart supervdsm just because an exception happened within one of its functions. You must differentiate between an exception raised by supervdsm function, and framework exceptions.
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 5 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Royce Lv has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 5: (1 inline comment)
.................................................... File vdsm/supervdsm.py Line 88: finally: Line 89: if err != "": Line 90: self._supervdsmProxy._log.error(err) Line 91: self._supervdsmProxy._restartSupervdsm() Line 92: We can raise a specific exception when supervdsmServer functions error, and catch it here and print the stack then ignore this error. But if vdsm proxy can't comunicate with supervdsmServer, proper action should be restart it. Line 93: Line 94: class SuperVdsmProxy(object): Line 95: """ Line 96: A wrapper around all the supervdsm init stuff
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 5 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 6: I would prefer that you didn't submit this
(3 inline comments)
.................................................... Commit Message Line 7: supervdsmServer is down after failed operation Line 8: Line 9: Bug-Id: https://bugzilla.redhat.com/show_bug.cgi?id=846307 Line 10: Line 11: After running operation that throws exception that we don't catch in throws an exception Line 12: supervdsmServer, we catch it in ProxyCaller::__call__ method. Line 13: In this except code we reset supervdsmServer and call the same method Line 14: again. Line 15: If the exception is thrown again, we leave supervdsmServer down And this
.................................................... File vdsm/supervdsm.py Line 61: def __call__(self, *args, **kwargs): Line 62: callMethod = lambda: \ Line 63: getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, Line 64: **kwargs) Line 65: if self._supervdsmProxy.isSupervdsmUp(): If not ....: self._supervdsmProxy.restartSupervdsm() return .... Line 66: return callMethod() Line 67: else: Line 68: self._supervdsmProxy.restartSupervdsm() Line 69: return callMethod()
.................................................... File vdsm/utils.py Line 852: Line 853: def __unicode__(self): Line 854: return unicode(self.cmd) Line 855: Line 856: def getProcessUpTimeByPID(pid): reuse the one in misc Line 857: try: Line 858: uptime = os.stat('/proc/' + str(pid)).st_mtime Line 859: return uptime Line 860: except Exception as e:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 6 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 9: Looks good to me, but someone else must approve
(1 inline comment)
.................................................... File vdsm/supervdsm.py Line 83: self._authkey, str(os.getpid())] Line 84: misc.execCmd(superVdsmCmd, sync=False, sudo=True) Line 85: sleep(2) Line 86: Line 87: def isSupervdsmUp(self): isRunnning() is sufficient as it's a member method of the superVDSM object Line 88: try: Line 89: with open(PIDFILE, "r") as f: Line 90: spid = int(f.readline().strip()) Line 91: createdTime = int(f.readline().strip())
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 9 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 10: I would prefer that you didn't submit this
(2 inline comments)
.................................................... File tests/superVdsmTests.py Line 1: from testrunner import VdsmTestCase as TestCaseBase Line 2: import supervdsm Add file to Makefile Line 3: Line 4: class TestSuperVdsm(TestCaseBase): Line 5: Line 6: def setUp(self):
.................................................... File vdsm/supervdsm.py Line 43: Line 44: raise RuntimeError("SuperVDSM Server not found") Line 45: Line 46: PIDFILE = os.path.join(constants.P_VDSM_RUN, "svdsm.pid") Line 47: TIMESTEMP = os.path.join(constants.P_VDSM_RUN, "svdsm.time") Add this file to the whitelist in clientif.py _cleanOldFiles() so it wouldn't get deleted Line 48: ADDRESS = os.path.join(constants.P_VDSM_RUN, "svdsm.sock") Line 49: SUPERVDSM = __supervdsmServerPath() Line 50: Line 51:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 10 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 11: I would prefer that you didn't submit this
(1 inline comment)
.................................................... File vdsm/clientIF.py Line 456: for f in os.listdir(constants.P_VDSM_RUN): Line 457: try: Line 458: vmId, fileType = f.split(".", 1) Line 459: if fileType in ["guest.socket", "monitor.socket", "pid", Line 460: "stdio.dump", "recovery"]: , "time" Line 461: if vmId in self.vmContainer: Line 462: continue Line 463: if f == 'vdsmd.pid': Line 464: continue
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 11 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 12: Looks good to me, but someone else must approve
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 12 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 12: I would prefer that you didn't submit this
(4 inline comments)
.................................................... File vdsm/supervdsm.py Line 43: Line 44: raise RuntimeError("SuperVDSM Server not found") Line 45: Line 46: PIDFILE = os.path.join(constants.P_VDSM_RUN, "svdsm.pid") Line 47: TIMESTEMP = os.path.join(constants.P_VDSM_RUN, "svdsm.time") TIMESTAMP Line 48: ADDRESS = os.path.join(constants.P_VDSM_RUN, "svdsm.sock") Line 49: SUPERVDSM = __supervdsmServerPath() Line 50: Line 51:
Line 62: def __call__(self, *args, **kwargs): Line 63: callMethod = lambda: \ Line 64: getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, Line 65: **kwargs) Line 66: if not self._supervdsmProxy.isRunning(): test-and-later-launch is racy. Line 67: self._supervdsmProxy.launch() Line 68: Line 69: try: Line 70: return callMethod()
Line 79: """ Line 80: A wrapper around all the supervdsm init stuff Line 81: """ Line 82: _log = logging.getLogger("SuperVdsmProxy") Line 83: pidfile = PIDFILE I like making the code more testable, but this merits its own patch imo.
If possible, use the same name for the module var and the class var. Line 84: tsfile = TIMESTEMP Line 85: Line 86: def open(self, *args, **kwargs): Line 87: return self._manager.open(*args, **kwargs)
Line 107: try: Line 108: with open(self.pidfile, "r") as f: Line 109: spid = int(f.read().strip()) Line 110: with open(self.tsfile, "r") as f: Line 111: createdTime = f.read().strip() I think that os.stat(pidfile) would give you the same info about creation time, with no need for another file. Line 112: pTime = str(misc.getProcCtime(spid)) Line 113: Line 114: if pTime == createdTime: Line 115: return True
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 12 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 12: (2 inline comments)
.................................................... File vdsm/supervdsm.py Line 79: """ Line 80: A wrapper around all the supervdsm init stuff Line 81: """ Line 82: _log = logging.getLogger("SuperVdsmProxy") Line 83: pidfile = PIDFILE you are right, but this is too complex now... is it mandatory? Line 84: tsfile = TIMESTEMP Line 85: Line 86: def open(self, *args, **kwargs): Line 87: return self._manager.open(*args, **kwargs)
Line 107: try: Line 108: with open(self.pidfile, "r") as f: Line 109: spid = int(f.read().strip()) Line 110: with open(self.tsfile, "r") as f: Line 111: createdTime = f.read().strip() the pid file creation is delayed in few milliseconds from the real process creation (create the file, writing to it and inc). we prefer to write the exact time to external file to be more strict Line 112: pTime = str(misc.getProcCtime(spid)) Line 113: Line 114: if pTime == createdTime: Line 115: return True
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 12 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: supervdsmServer is down after failed operation ......................................................................
Patch Set 13: I would prefer that you didn't submit this
(1 inline comment)
.................................................... File vdsm/supervdsm.py Line 54: Line 55: Line 56: class ProxyCaller(object): Line 57: Line 58: _proxyLock = threading.Lock() Avoid static locks. Pass the lock from the svdsm instance in the ctor. Line 59: Line 60: def __init__(self, supervdsmProxy, funcName): Line 61: self._funcName = funcName Line 62: self._supervdsmProxy = supervdsmProxy
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 13 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: svdsm validation of recovery after crash ......................................................................
Patch Set 14: I would prefer that you didn't submit this
(3 inline comments)
.................................................... File tests/superVdsmTests.py Line 5: Line 6: Line 7: class TestSuperVdsm(TestCaseBase): Line 8: Line 9: def checkSudoer(f): There is already a decorator in testValidation.py Line 10: p = subprocess.Popen(['sudo', '-l', '-n', 'ls'], Line 11: stdin=subprocess.PIPE, stdout=subprocess.PIPE, Line 12: stderr=subprocess.PIPE) Line 13: out, err = p.communicate()
.................................................... File vdsm/supervdsm.py Line 74: return callMethod() Line 75: # handling internal exception that we raise to identify supervdsm Line 76: # validation. only this exception can cause kill! Line 77: except AuthenticationError: Line 78: self._supervdsmProxy.kill() If you just killed SVDSM how do you expect callMethod to work Line 79: return callMethod() Line 80: Line 81: Line 82: class SuperVdsmProxy(object):
Line 115: with open(self.pidfile, "r") as f: Line 116: spid = int(f.read().strip()) Line 117: with open(self.timestamp, "r") as f: Line 118: createdTime = f.read().strip() Line 119: pTime = str(misc.getProcCtime(spid)) There is no need to put this in the context Line 120: Line 121: if pTime == createdTime: Line 122: return True Line 123:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 14 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: svdsm validation of recovery after crash ......................................................................
Patch Set 14: (3 inline comments)
.................................................... File tests/superVdsmTests.py Line 5: Line 6: Line 7: class TestSuperVdsm(TestCaseBase): Line 8: Line 9: def checkSudoer(f): the decorator there does something else.. it gets cmd to run, i just want to verify if i can perform sudo command without enter password, so here i run 'sudo ls' command and if it works i move on. otherwise, i can't execute svdsm Line 10: p = subprocess.Popen(['sudo', '-l', '-n', 'ls'], Line 11: stdin=subprocess.PIPE, stdout=subprocess.PIPE, Line 12: stderr=subprocess.PIPE) Line 13: out, err = p.communicate()
.................................................... File vdsm/supervdsm.py Line 74: return callMethod() Line 75: # handling internal exception that we raise to identify supervdsm Line 76: # validation. only this exception can cause kill! Line 77: except AuthenticationError: Line 78: self._supervdsmProxy.kill() i want to perform the same __call__ function again with same arguments, it will initiate svdsm and try to call the same method again. i'll change it. Line 79: return callMethod() Line 80: Line 81: Line 82: class SuperVdsmProxy(object):
Line 115: with open(self.pidfile, "r") as f: Line 116: spid = int(f.read().strip()) Line 117: with open(self.timestamp, "r") as f: Line 118: createdTime = f.read().strip() Line 119: pTime = str(misc.getProcCtime(spid)) you right.. Line 120: Line 121: if pTime == createdTime: Line 122: return True Line 123:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 14 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Shu Ming has posted comments on this change.
Change subject: svdsm validation of recovery after crash ......................................................................
Patch Set 16: (1 inline comment)
.................................................... File vdsm/supervdsmServer.py Line 86: Line 87: class _SuperVdsm(object): Line 88: Line 89: @logDecorator Line 90: def ping(self, *args, **kwargs): It seems this method is not used any more. Do you still need this method? Line 91: # This method purpose to test if svdsm is up and returns after a call Line 92: return True Line 93: Line 94: @logDecorator
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 16 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Shu Ming has posted comments on this change.
Change subject: svdsm validation of recovery after crash ......................................................................
Patch Set 16: (2 inline comments)
.................................................... File vdsm/supervdsm.py Line 74: try: Line 75: return callMethod() Line 76: # handling internal exception that we raise to identify supervdsm Line 77: # validation. only this exception can cause kill! Line 78: except AuthenticationError: How about the other exceptions? Should we just pass them? Line 79: with _g_singletonSupervdsmInstance_lock: Line 80: self._supervdsmPoxy.kill() Line 81: self._supervdsmProxy.launch() Line 82: return callMethod()
.................................................... File vdsm/supervdsmServer.py Line 86: Line 87: class _SuperVdsm(object): Line 88: Line 89: @logDecorator Line 90: def ping(self, *args, **kwargs): Please ignore it, just understand that it only for test purpose. Line 91: # This method purpose to test if svdsm is up and returns after a call Line 92: return True Line 93: Line 94: @logDecorator
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 16 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: svdsm validation of recovery after crash ......................................................................
Patch Set 16: (1 inline comment)
.................................................... File vdsm/supervdsm.py Line 74: try: Line 75: return callMethod() Line 76: # handling internal exception that we raise to identify supervdsm Line 77: # validation. only this exception can cause kill! Line 78: except AuthenticationError: no, we need to catch them where we call to svdsm, here we ignore and raise them. Only AuthenicationError is handled here because it requires restart Line 79: with _g_singletonSupervdsmInstance_lock: Line 80: self._supervdsmPoxy.kill() Line 81: self._supervdsmProxy.launch() Line 82: return callMethod()
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 16 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: super vdsm validation of recovery after crash ......................................................................
Patch Set 20: Verified
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 20 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: super vdsm validation of recovery after crash ......................................................................
Patch Set 20: I would prefer that you didn't submit this
(1 inline comment)
.................................................... File vdsm/supervdsm.py Line 154: raise Line 155: self._svdsm = self._manager.instance() Line 156: Line 157: def launch(self): Line 158: self._launch() Don't use that stupid naming scheme where the a private method has the same name as a public one.
either change the private one to startProxy() or change the public one to init().
They can't have the same name if they don't do the same thing! Line 159: utils.retry(self._connect, Exception, timeout=60) Line 160: Line 161: def __getattr__(self, name): Line 162: return ProxyCaller(self, name)
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 20 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: super vdsm validation of recovery after crash ......................................................................
Patch Set 21: Looks good to me, but someone else must approve
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 21 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: super vdsm validation of recovery after crash ......................................................................
Patch Set 21: Looks good to me, approved
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 21 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: super vdsm validation of recovery after crash ......................................................................
Patch Set 21: I would prefer that you didn't submit this
(11 inline comments)
.................................................... File tests/superVdsmTests.py Line 7: Line 8: def setUp(self): Line 9: self._proxy = supervdsm.getProxy() Line 10: Line 11: # changing default constants for test. remember to return the origin you have monkeypatch.py for exactly this purpose. (but you do not HAVE to use it if you don't want to). Line 12: # value Line 13: self._originpidfile = self._proxy.pidfile Line 14: self._origintimestempfile = self._proxy.timestamp Line 15: self._originsock = self._proxy.address
Line 14: self._origintimestempfile = self._proxy.timestamp Line 15: self._originsock = self._proxy.address Line 16: Line 17: # temporary values to run temporary svdsm Line 18: self._proxy.pidfile = '/tmp/svdsm.pid' better use tempfile over hard-coded name. What if another user (say, Jenkins) runs the same test on the same machine? Line 19: self._proxy.timestamp = '/tmp/svdsm.time' Line 20: self._proxy.address = '/tmp/svdsm.sock' Line 21: Line 22: def tearDown(self):
Line 25: self._proxy.timestamp = self._origintimestempfile Line 26: self._proxy.address = self._originsock Line 27: Line 28: def testIsSuperUp(self): Line 29: testValidation.checkSudo(['ls']) why do you need the running user to be able to have "sudo ls"? I suppose that checkSudo(['python', "supervdsmServer.py"]) would be more relevant. Line 30: self._proxy.ping() # this call initiate svdsm Line 31: self._proxy.ping() Line 32: Line 33: self.assertTrue(self._proxy.isRunning())
Line 27: Line 28: def testIsSuperUp(self): Line 29: testValidation.checkSudo(['ls']) Line 30: self._proxy.ping() # this call initiate svdsm Line 31: self._proxy.ping() why isn't it enough to call ping() only once? Line 32: Line 33: self.assertTrue(self._proxy.isRunning()) Line 34: Line 35: def testKillSuper(self):
Line 34: Line 35: def testKillSuper(self): Line 36: testValidation.checkSudo(['ls']) Line 37: self._proxy.ping() Line 38: self._proxy.kill() I suppose you'd want to assertFail(isRunning) here. Line 39: self._proxy.ping() # Launching vdsm after kill
.................................................... File vdsm/clientIF.py Line 465: if f == 'respawn.pid': Line 466: continue Line 467: if f == 'svdsm.pid': Line 468: continue Line 469: if f == 'svdsm.sock': it's not your fault, but seems like this file is removed by this function, as it does not have a known "fileType". Is it fine by supervdsm?
yuck. this function begs for a rewrite using sets. (in another patch, please) Line 470: continue Line 471: if f == 'svdsm.time': Line 472: continue Line 473: else:
.................................................... File vdsm/supervdsm.py Line 27: import uuid Line 28: from time import sleep Line 29: import storage.misc as misc Line 30: from vdsm import constants, utils Line 31: import getpass hmm, nice to know about this module. But why do we need to trust the environment? why os.getuid() / os.getguid() are not enough? Line 32: Line 33: _g_singletonSupervdsmInstance = None Line 34: _g_singletonSupervdsmInstance_lock = threading.Lock() Line 35:
Line 86: """ Line 87: A wrapper around all the supervdsm init stuff Line 88: """ Line 89: _log = logging.getLogger("SuperVdsmProxy") Line 90: proxyLock = threading.Lock() Too bad that you continue to mangle functional changes with test-specific ones.
It is bad for readability, and going to make some hard time downstream. We do not have many reviewers, so please make their life easier.
now that Saggi has acked+2 your patch, I do not want to delay it any further. but please avoid this. Line 91: Line 92: def __init__(self): Line 93: self.pidfile = PIDFILE Line 94: self.timestamp = TIMESTAMP
Line 119: try: Line 120: self._log.debug("Killing svdsm") Line 121: with open(self.pidfile, "r") as f: Line 122: spid = int(f.read().strip()) Line 123: self._svdsm.shutdown() # very important before execute kill why is it "very important"? how did we survive without this before? this comments does not help me much. Line 124: misc.execCmd([constants.EXT_KILL, "-9", str(spid)], sudo=True) Line 125: self._cleanOldFiles() Line 126: except Exception as e: Line 127: self._log.debug("Could not kill old super vdsm %s", e)
.................................................... File vdsm/supervdsmServer.py Line 356: servThread.setDaemon(True) Line 357: servThread.start() Line 358: Line 359: log.debug("Changing owner to svdsm files to %s", user) Line 360: chown(address, user, METADATA_GROUP) there's a little bit of asymmetry here: why was it important to pass the user from the caller, but the group can be kept hard-coded? Line 361: chown(timestamp, user, METADATA_GROUP) Line 362: chown(pidfile, user, METADATA_GROUP) Line 363: Line 364: log.debug("Started serving super vdsm object")
Line 367: log.error("Could not start Super Vdsm: %s", e, exc_info=True) Line 368: sys.exit(1) Line 369: finally: Line 370: try: Line 371: os.unlink(address) on the sys.exit(errno.EPERM) path, this would explode, since "address" is not yet defined. Line 372: except OSError: Line 373: pass Line 374: Line 375: if __name__ == '__main__':
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 21 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 24: I would prefer that you didn't submit this
(4 inline comments)
.................................................... File vdsm/supervdsm.py Line 117: misc.execCmd([constants.EXT_KILL, "-9", str(pid)], sudo=True) Line 118: self._cleanOldFiles() Line 119: except Exception, ex: Line 120: self._log.debug("Could not kill old Super Vdsm %s", ex) Line 121: self._cleanOldFiles() nitpick: you could call self._cleanOldFiles() once, on finally, or after the block. Line 122: Line 123: self._authkey = None Line 124: self._manager = None Line 125: self._svdsm = None
Line 124: self._manager = None Line 125: self._svdsm = None Line 126: Line 127: def isRunning(self): Line 128: try: I did not notice this before, but why do you have here this catch-all try-block?
it is a bad practice in general - and I think that specifically here, too. Assume that TIMESTAMP has become unreadable due to an evil admin. Why should we continue to recreate a new superVdsm process? it would be better to crash and burn and report a log of the problem. Line 129: with open(PIDFILE, "r") as f: Line 130: spid = int(f.read().strip()) Line 131: with open(TIMESTAMP, "r") as f: Line 132: createdTime = f.read().strip()
Line 134: if pTime == createdTime: Line 135: return True Line 136: Line 137: except Exception as e: Line 138: self._log.debug("could not read if svdsm is up: %s", e) if for some reason you need to keep this try-block, please add exc_info=True to have more info in the logs. Line 139: Line 140: return False Line 141: Line 142: def _connect(self):
.................................................... File vdsm/supervdsmServer.py Line 348: servThread = threading.Thread(target=server.serve_forever) Line 349: servThread.setDaemon(True) Line 350: servThread.start() Line 351: Line 352: chown(ADDRESS, METADATA_USER, METADATA_GROUP) nitpick: sounds like a good place to use a for-loop. Line 353: chown(TIMESTAMP, METADATA_USER, METADATA_GROUP) Line 354: chown(PIDFILE, METADATA_USER, METADATA_GROUP) Line 355: log.debug("Started serving super vdsm object") Line 356: servThread.join()
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 24 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 26: I would prefer that you didn't submit this
(3 inline comments)
.................................................... File vdsm/supervdsm.py Line 87: A wrapper around all the supervdsm init stuff Line 88: """ Line 89: _log = logging.getLogger("SuperVdsmProxy") Line 90: proxyLock = threading.Lock() Line 91: firstLaunch = True I do not understand why you need this flag - and certainly not why it is class-wide. it should have been an instance variable. Line 92: Line 93: def open(self, *args, **kwargs): Line 94: return self._manager.open(*args, **kwargs) Line 95:
Line 130: createdTime = f.read().strip() Line 131: except IOError as e: Line 132: # pid file and timestamp file must be exist after first launch, Line 133: # otherwise excpetion will be raised to svdsm caller Line 134: if e.errno == ENOENT and self.firstLaunch: it is expected that the files would be missing also after kill(), not only in firstLaunch. Line 135: return False Line 136: Line 137: try: Line 138: pTime = str(misc.getProcCtime(spid))
Line 136: Line 137: try: Line 138: pTime = str(misc.getProcCtime(spid)) Line 139: except OSError: Line 140: # Means pid is not exist, svdsm was killed I think you are not specific enough. OSError can mean a lot of other thing. You should catch only ESRCH. Line 141: return False Line 142: Line 143: if pTime == createdTime: Line 144: return True
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 26 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Saggi Mizrahi has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 29: I would prefer that you didn't submit this
(2 inline comments)
.................................................... File vdsm/supervdsm.py Line 86: """ Line 87: A wrapper around all the supervdsm init stuff Line 88: """ Line 89: _log = logging.getLogger("SuperVdsmProxy") Line 90: proxyLock = threading.Lock() Why is it still a class field instead of an instance field Line 91: Line 92: def __init__(self): Line 93: self._firstLaunch = True Line 94:
Line 96: return self._manager.open(*args, **kwargs) Line 97: Line 98: def _cleanOldFiles(self): Line 99: self._log.warn("Cleanning svdsm old files: %s, %s, %s", Line 100: PIDFILE, TIMESTAMP, ADDRESS) Why is that warning or even note worthy? Line 101: utils.rmFile(PIDFILE) Line 102: utils.rmFile(TIMESTAMP) Line 103: utils.rmFile(ADDRESS) Line 104:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 29 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 30: I would prefer that you didn't submit this
(1 inline comment)
.................................................... File vdsm/clientIF.py Line 454: Line 455: def _cleanOldFiles(self): Line 456: known_exts = ("guest.socket", "pid", "recovery") Line 457: dontDel = ("vdsmd.pid", "respawn.pid", "svdsm.pid", "svdsm.sock", Line 458: "svdsm.time") this is NOT needed, as "time" is not in known_exts.
and more than that - the whole dependency on two unrelated changes (to cleanOldFiles and rmFile) adds needless complexity. Line 459: Line 460: fNames = [fName for fName in os.listdir(constants.P_VDSM_RUN) Line 461: if fName not in dontDel] Line 462: for fName in fNames:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 30 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 31: Looks good to me, approved
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 31 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Yaniv Bronhaim has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 31: Verified
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 31 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 31:
I'm taking this even though its unit tests are not ready yet. Please do not forget pushing for http://gerrit.ovirt.org/#/c/8600/
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 31 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Dan Kenigsberg has submitted this change and it was merged.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
SuperVdsm validation of recovery after crash
When calling svdsm we verify if svdsm is up by reading its pid create time and compare it to timestamp that is written every svdsm initialization.
Bug-Id: https://bugzilla.redhat.com/show_bug.cgi?id=846307 Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Signed-off-by: Yaniv Bronhaim ybronhei@redhat.com --- M vdsm/supervdsm.py M vdsm/supervdsmServer.py 2 files changed, 73 insertions(+), 20 deletions(-)
Approvals: Yaniv Bronhaim: Verified Dan Kenigsberg: Looks good to me, approved
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: merged Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 31 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
Zhou Zheng Sheng has posted comments on this change.
Change subject: SuperVdsm validation of recovery after crash ......................................................................
Patch Set 32: (2 inline comments)
I find this patch when reviewing http://gerrit.ovirt.org/#/c/9691/ .
.................................................... File vdsm/supervdsm.py Line 68: # getting inside only when svdsm is down. its rare case so we Line 69: # don't care that isRunning will run twice Line 70: with self._supervdsmProxy.proxyLock: Line 71: if not self._supervdsmProxy.isRunning(): Line 72: self._supervdsmProxy.launch() Regarding the bug mentioned in the bugzilla, this patch is to make a distinguish between the payload exception and exception from the framework itself.
In the original code, if the remote super vdsm server is not up or there are some problems with it, IOError and socket.error will be caught. This is mixing with the IOError and socket.error in the payload. So this patch just stop catching those exceptions and check if the remote server is OK before perform the call.
However this is not very nice I think. Firstly, there is a time window between the check and the actual remote call. The remote server can die in this window so we will get IOError or socket.error bellow. Secondly, if the remote server dies while it is performing a call, a IOError and socket.error is still raised.
Though the above two situations are rare and this piece of code may survive most of the time, I still think solving the problem this way is not very nice.
To make a distinguish between the payload exception and the the exception from the super vdsm itself. We can apply the following solution.
class _SuperVdsm(object): @logDecorator def someOperation(self, *args, **kwargs): ex = None stack = None r = None
try: r = _performOperation(*args, **kwargs) except Exception as e: ex = e stack = sys.exc_info()[2]
return (r, (ex, stack))
For each operation in _SuperVdsm, wrap it in a try except, probably we can create a decorator to do that.
Then in ProxyCaller.__call__(), we can parse the result like this.
try: r, error = callMethod() # handling internal exception that we raise to identify supervdsm # validation. only this exception can cause kill! except (IOError, socket.error, AuthenticationError): with self._supervdsmProxy.proxyLock: self._supervdsmProxy.kill() self._supervdsmProxy.launch() r, error = callMethod()
if error: ex, stack = error raise ex, None, stack return r
Then we can continue to throw and catch whatever exception in the super vdsm server. These exceptions are kept from the payload exceptions cleanly.
How does this proposal sound like? Line 73: Line 74: try: Line 75: return callMethod() Line 76: # handling internal exception that we raise to identify supervdsm
Line 147: return False Line 148: else: Line 149: raise Line 150: Line 151: if pTime == createdTime: I'm very confused. Under what condition will "pTime != createdTime". I think pTime will always equals to createdTime. Because the content in the pid file and timestamp file are updated always at the same time by the process who updates the pid file. Line 152: return True Line 153: else: Line 154: return False Line 155:
-- To view, visit http://gerrit.ovirt.org/7901 To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment Gerrit-Change-Id: Idad4a622b82259b777851d1b0c1b37ec8da2b01e Gerrit-PatchSet: 32 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Barak Azulay bazulay@redhat.com Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com Gerrit-Reviewer: Douglas Schilling Landgraf dougsland@redhat.com Gerrit-Reviewer: Eduardo ewarszaw@redhat.com Gerrit-Reviewer: Igor Lvovsky ilvovsky@redhat.com Gerrit-Reviewer: Royce Lv lvroyce@linux.vnet.ibm.com Gerrit-Reviewer: Saggi Mizrahi smizrahi@redhat.com Gerrit-Reviewer: Shu Ming shuming@linux.vnet.ibm.com Gerrit-Reviewer: Yaniv Bronhaim ybronhei@redhat.com Gerrit-Reviewer: Yeela Kaplan ykaplan@redhat.com
vdsm-patches@lists.fedorahosted.org