Termination Detection

A distributed computation has terminated in a state in which all agents are waiting to receive messages and all channels are empty. A termination detection algorithm is executed by the operating system to determine whether a client computation has terminated.

The Problem

A computation terminates in states in which all channels are empty and all agents are waiting to receive messages. An agent is said to be active while it is processing a message and idle while it is waiting to process a message. A terminated state is one in which all agents are idle and all channels are empty.

Let \(C_{s}\) and \(C_{r}\) be the numbers of messages sent and received (respectively) on channel \(C\). A computation is in a terminated state exactly when:

All agents are idle and
for all channels \(C\): \(\; C_{s} = C_{r}\).

The problem is to design an algorithm that detects whether the computation is in a terminated state.

How Should You Solve the Problem?

Strategy

Use one of the strategies described earlier. Let's use observers and optimize the observer-based global snapshot algorithm for the special case of termination detection.

There exists a consistent cut (past, future) of a dataflow graph exactly when every edge to a past vertex is from a past vertex. In our algorithms, past is the set of steps (vertices) at each agent before the agent takes its local snapshot. This guarantees that every agent edge to a past vertex is from a past vertex. The algorithm must also ensure that every message edge to a past vertex is from a past vertex. To do so we use the following observation.

Observation

Let past be any set of steps in a computation. For a channel \(C\), let \(C_{s}\) and \(C_{r}\) be the numbers of messages sent and received, respectively, on channel \(C\), in past.

Every message received along \(C\) in a step in past is sent in a step in past exactly when:

\(\; C_{s} \geq C_{r}\)

The observation suggests the following algorithm.

A Termination Detection Algorithm

Agent Actions

When an agent changes state from active to idle (i.e., the agent starts waiting to receive a message) the agent sends a message to the observer. The message to the observer contains \(C_{s}\) for each output channel \(C\) where \(C_{s}\) is the number of messages sent on channel \(C\) up to the point at which the agent becomes idle. The message also contains \(C_{r}\) for each input channel of the agent where \(C_{r}\) is the number of messages received on channel \(C\) up to this point.

Observer Actions

The observer keeps only the latest message that it receives from each agent. For each channel \(C\), let \(C_{s}^{*}\) and \(C_{r}^{*}\) be the latest value of \(C_{s}\) and \(C_{r}\), respectively, that the observer has received.

Initial Condition

All agents are idle. \(C_{s}^{*}\) and \(C_{r}^{*}\) are the numbers of messages sent and received (respectively) on channel \(C\) for all \(C\). Some channels may contain messages.

The Termination Detection Condition:

The observer detects that computation has terminated if

For all channels \(C\): \(C_{s}^{*} = C_{r}^{*}\).

Proof of Correctness

We first prove that if the observer detects that the computation has terminated then the computation has indeed terminated.

Assume that the observer has detected termination. Then for each channel \(C\), either \(C_{s}^{*} = C_{r}^{*}\) initially, or the observer received messages containing \(C_{s}^{*}\) and \(C_{r}^{*}\) such that \(C_{s}^{*} = C_{r}^{*}\).

Let(past, future) be a partition of the steps of the computation where past consists of steps at an agent before the agent sent the messages to the observer containing \(C_{s}^{*}\) and \(C_{r}^{*}\) for channels \(C\) incident on the agent. (If the agent sends no messages to the observer then the agent has no steps in past other than the initialization step.)

Because \(C_{s}^{*} \geq C_{r}^{*}\) for all channels \(C\) it follows that (past, future) is a consistent cut.

From the definition of terminated state it follows that the state at this consistent cut is a terminated state. Termination is a stable property -- once computation has terminated it remains terminated. So, if the state at a consistent cut of the computation is a terminated state then all succeeding states are terminated states.

Next we prove that if the computation terminates then the observer detects termination. Assume that the computation has terminated. The last message sent by each agent has counts \(C_{s}^{*}\) and \(C_{r}^{*}\) of the numbers of messages sent and received (respectively) for each of its incident channels \(C\). Because these are the last messages sent when the algorithm terminates it follows that \(C_{s}^{*} = C_{r}^{*}\) for all \(C\).

Next database deadlock detection

K. Mani Chandy, Emeritus Simon Ramo Professor, California Institute of Technology