Gitweb:
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=467d015c53e...
Commit: 467d015c53e5e2d00025708cda95b3df41ebda1f
Parent: d88584bd640700c51692198d2f6aeda0e773165c
Author: Ryan McCabe <rmccabe(a)redhat.com>
AuthorDate: Mon Jul 16 10:57:28 2012 -0400
Committer: Ryan McCabe <rmccabe(a)redhat.com>
CommitterDate: Mon Jul 16 11:05:26 2012 -0400
rgmanager: Fix for services stuck in recovery
Patch from John Ruemker <jruemker(a)redhat.com>:
"When starting rgmanager throughout the cluster around the same
time, multiple nodes may end up acting as the "root" for a particular
service. If that service happens to fail on startup, you can end up
with each of those nodes sending remote-start requests around the
cluster. Eventually the service will get stuck in a recovering state,
and cannot be modified in any way with clusvcadm. The only remedy we've
found is to kill rgmanager and start it back up.
Acked-by: Lon Hohberger <lhh(a)redhat.com>
Signed-off-by: Ryan McCabe <rmccabe(a)redhat.com>
---
rgmanager/src/daemons/groups.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/rgmanager/src/daemons/groups.c b/rgmanager/src/daemons/groups.c
index 20ed2e1..bd406c8 100644
--- a/rgmanager/src/daemons/groups.c
+++ b/rgmanager/src/daemons/groups.c
@@ -747,7 +747,8 @@ eval_groups(int local, uint32_t nodeid, int nodeStatus)
(svcStatus.rs_state == RG_STATE_STARTED ||
svcStatus.rs_state == RG_STATE_RECOVER ||
svcStatus.rs_state == RG_STATE_STARTING ||
- svcStatus.rs_state == RG_STATE_STOPPING )) {
+ svcStatus.rs_state == RG_STATE_STOPPING ||
+ svcStatus.rs_state == RG_STATE_ERROR)) {
clulog(LOG_DEBUG,
"Marking %s on down member %d as stopped",
@@ -789,7 +790,8 @@ eval_groups(int local, uint32_t nodeid, int nodeStatus)
/* Disabled/failed/in recovery? Do nothing */
if ((svcStatus.rs_state == RG_STATE_DISABLED) ||
(svcStatus.rs_state == RG_STATE_FAILED) ||
- (svcStatus.rs_state == RG_STATE_RECOVER)) {
+ (svcStatus.rs_state == RG_STATE_RECOVER) ||
+ (svcStatus.rs_state == RG_STATE_ERROR)) {
continue;
}