fence_sanlock/Makefile | 3
fence_sanlock/fence_sanlock.8 | 203 +++++++++++++++++++++++++++++++++++++++++
fence_sanlock/fence_sanlockd.8 | 39 +++++++
3 files changed, 245 insertions(+)
New commits:
commit 732f8998d2b68ad4363989f6aadd2225d538527f
Author: David Teigland <teigland(a)redhat.com>
Date: Fri Sep 28 15:18:32 2012 -0500
fence_sanlock: add man pages
Signed-off-by: David Teigland <teigland(a)redhat.com>
diff --git a/fence_sanlock/Makefile b/fence_sanlock/Makefile
index 3c0752c..99a9a5a 100644
--- a/fence_sanlock/Makefile
+++ b/fence_sanlock/Makefile
@@ -53,4 +53,7 @@ MANDIR=/usr/share/man
.PHONY: install
install: all
$(INSTALL) -d $(DESTDIR)/$(BINDIR)
+ $(INSTALL) -d $(DESTDIR)/$(MANDIR)/man8
$(INSTALL) -c -m 755 $(TARGET1) $(TARGET2) $(DESTDIR)/$(BINDIR)
+ $(INSTALL) -m 644 fence_sanlock.8 $(DESTDIR)/$(MANDIR)/man8/
+ $(INSTALL) -m 644 fence_sanlockd.8 $(DESTDIR)/$(MANDIR)/man8/
diff --git a/fence_sanlock/fence_sanlock.8 b/fence_sanlock/fence_sanlock.8
new file mode 100644
index 0000000..3c1d472
--- /dev/null
+++ b/fence_sanlock/fence_sanlock.8
@@ -0,0 +1,203 @@
+.TH FENCE_SANLOCK 8 2012-09-26
+
+.SH NAME
+fence_sanlock \- fence agent using watchdog and shared storage leases
+
+.SH SYNOPSIS
+.B fence_sanlock
+[OPTIONS]
+
+.SH DESCRIPTION
+fence_sanlock uses the watchdog device to reset nodes, in conjunction with
+three daemons: fence_sanlockd, sanlock, and wdmd.
+
+The watchdog device, controlled through /dev/watchdog, is available when a
+watchdog kernel module is loaded. A module should be loaded for the
+available hardware. If no hardware watchdog is available, or no module is
+loaded, the "softdog" module will be loaded, which emulates a hardware
+watchdog device.
+
+Shared storage must be configured for sanlock to use from all hosts. This
+is generally an lvm lv (non-clustered), but could be another block device,
+or NFS file. The storage should be 1GB of fully allocated space. After
+being created, the storage must be initialized with the command:
+.br
+# fence_sanlock -o sanlock_init -p /path/to/storage
+
+The fence_sanlock agent uses sanlock leases on shared storage to verify
+that hosts have been reset, and to notify fenced nodes that are still
+running, that they should be reset.
+
+The fence_sanlockd init script starts the wdmd, sanlock and fence_sanlockd
+daemons before the cluster or fencing systems are started (e.g. cman,
+corosync and fenced). The fence_sanlockd daemon is started with the -w
+option so it waits for the path and host_id options to be provided when
+they are available.
+
+Unfencing must be configured for fence_sanlock in cluster.conf. The cman
+init script does unfencing by running fence_node -U, which in turn runs
+fence_sanlock with the "on" action and local path and host_id values taken
+from cluster.conf. fence_sanlock in turn passes the path and host_id
+values to the waiting fence_sanlockd daemon. With these values,
+fence_sanlockd joins the sanlock lockspace and acquires a resource lease
+for the local host. It can take several minutes to complete these
+unfencing steps.
+
+Once unfencing is complete, the node is a member of the sanlock lockspace
+named "fence" and the node's fence_sanlockd process holds a resource lease
+named "hN", where N is the node's host_id. (To verify this, run the
+commands "sanlock client status" and "sanlock client host_status", which
+show state from the sanlock daemon, or "sanlock direct dump <path>" which
+shows state from shared storage.)
+
+When fence_sanlock fences a node, it tries to acquire that node's resource
+lease. sanlock will not grant the lease until the owner (the node being
+fenced) has been reset by its watchdog device. The time it takes to
+acquire the lease is 140 seconds from the victim's last lockspace renewal
+timestamp on the shared storage. Once acquired, the victim's lease is
+released, and fencing completes successfully.
+
+Live nodes being fenced
+
+When a live node is being fenced, fence_sanlock will continually fail to
+acquire the victim's lease, because the victim continues to renew its
+lockspace membership on storage, and the fencing node sees it is alive.
+This is by design. As long as the victim is alive, it must continue to
+renew its lockspace membership on storage. The victim must not allow the
+remote fence_sanlock to acquire its lease and consider it fenced while it
+is still alive.
+
+At the same time, a victim knows that when it is being fenced, it should
+be reset to avoid blocking recovery of the rest of the cluster. To
+communicate this, fence_sanlock makes a "request" on storage for the
+victim's resource lease. On the victim, fence_sanlockd, which holds the
+resource lease, is configured to receive SIGUSR1 from sanlock if anyone
+requests its lease. Upon receiving the signal, fence_sanlockd knows that
+it is a fencing victim. In response to this, fence_sanlockd allows its
+wdmd connection to expire, which in turn causes the watchdog device to
+fire, resetting the node.
+
+The watchdog reset will obviously have the effect of stopping the victim's
+lockspace membership renewals. Once the renewals stop, fence_sanlock will
+finally be able to acquire the victim's lease after waiting a fixed time
+from the final lockspace renewal.
+
+Loss of shared storage
+
+If access to shared storage with sanlock leases is lost for 80 seconds,
+sanlock is not able to renew the lockspace membership, and enters
+recovery. This causes sanlock clients holding leases, such as
+fence_sanlockd, to be notified that their leases are being lost. In
+response, fence_sanlockd must reset the node, much as if it was being
+fenced.
+
+Daemons killed/crashed/hung
+
+If sanlock, fence_sanlockd daemons are killed abnormally, or crash or
+hang, their wdmd connections will expire, causing the watchdog device to
+fire, resetting the node. fence_sanlock from another node will then run
+and acquire the victim's resource lease. If the wdmd daemon is killed
+abnormally or crashes or hangs, it will not pet the watchdog device,
+causing it to fire and reset the node.
+
+Time Values
+
+The specific times periods referenced above, e.g. 140, 80, are based on
+the default sanlock i/o timeout of 10 seconds. If sanlock is configured
+to use a different i/o timeout, these numbers will be different.
+
+.SH OPTIONS
+
+.BI \-o " action"
+ The agent action:
+
+.IP
+.B on
+.br
+Enable the local node to be fenced. Used by unfencing.
+
+.IP
+.B off
+.br
+Disable another node.
+
+.IP
+.B status
+.br
+Test if a node is on or off. A node is on if it's lease is held, and off
+is it's lease is free.
+
+.IP
+.B metadata
+.br
+Print xml description of required parameters.
+
+.IP
+.B sanlock_init
+.br
+Initialize sanlock leases on shared storage.
+
+.PP
+
+.BI \-p " path"
+ The path to shared storage with sanlock leases.
+
+.PP
+
+.BI \-i " host_id"
+ The host_id, from 1-128.
+
+.SH STDIN PARAMETERS
+
+Options can be passed on stdin, with the format key=val. Each key=val
+pair is separated by a new line.
+
+action=on|off|status
+.br
+See \-o
+
+path=/path/to/shared/storage
+.br
+See \-p
+
+host_id=num
+.br
+See \-i
+
+.SH FILES
+
+Example cluster.conf configuration for fence_sanlock:
+
+.nf
+<clusternode name="node01" nodeid="1">
+ <fence>
+ <method name="1">
+ <device name="wd" host_id="1"/>
+ </method>
+ </fence>
+ <unfence>
+ <device name="wd" host_id="1" action="on"/>
+ </unfence>
+</clusternode>
+
+<clusternode name="node02" nodeid="2">
+ <fence>
+ <method name="1">
+ <device name="wd" host_id="2"/>
+ </method>
+ </fence>
+ <unfence>
+ <device name="wd" host_id="2" action="on"/>
+ </unfence>
+</clusternode>
+
+<fencedevice name="wd" agent="fence_sanlock" device="/dev/fence/leases"/>
+.fi
+
+.SH SEE ALSO
+.BR fence_sanlockd (8),
+.BR sanlock (8),
+.BR wdmd (8),
+.BR fence_node (8),
+.BR fenced (8)
+
diff --git a/fence_sanlock/fence_sanlockd.8 b/fence_sanlock/fence_sanlockd.8
new file mode 100644
index 0000000..33ce051
--- /dev/null
+++ b/fence_sanlock/fence_sanlockd.8
@@ -0,0 +1,39 @@
+.TH FENCE_SANLOCKD 8 2012-09-26
+
+.SH NAME
+fence_sanlockd \- daemon for fence_sanlock agent
+
+.SH SYNOPSIS
+.B fence_sanlockd
+[OPTIONS]
+
+.SH DESCRIPTION
+The fence_sanlockd daemon is used by the fence_sanlock agent.
+See
+.BR fence_sanlock (8),
+for full description.
+
+.SH OPTIONS
+
+.B \-D
+ Enable debugging to stderr and don't fork.
+
+.BI \-p " path"
+ Path to shared storage with sanlock leases.
+
+.BI \-i " host_id"
+ Local sanlock host_id (1-128).
+
+.B \-w
+ Wait for fence_sanlockd -s to send options (p,i).
+
+.B \-s
+ Send options (p,i) to waiting fence_sanlockd -w.
+
+.SH SEE ALSO
+.BR fence_sanlock (8),
+.BR sanlock (8),
+.BR wdmd (8),
+.BR fence_node (8),
+.BR fenced (8)
+