Distributed Algorithms Contents Index

Global Snapshot Algorithm: Details

This webpage gives some details of the global snapshot algorithm.

Global Snapshot: Details

This webpage gives a code outline for the global snapshot algorithm and gives examples of steps in the algorithm.
Code Structure
The code outline is given below in Python. channels_recorded is a dict (dictionary), where channels_recorded[sender] becomes True when the snapshot for the channel from this sender has finished being recorded. channel_snapshots is a dict where channel_snapshots[sender] is the ongoing recording of the snapshot of the channel from the sender.
taken_local_snapshot = False
channel_snapshots = {key: [] for key in predecessors}
channels_recorded = {key: False for key in predecessors}

start()
def receive(message, sender):
   if isinstance(message, Marker) and not taken_local_snapshot:
        local_snapshot = record_state()
        taken_local_snapshot = True
        channels_recorded[sender] = True
        output_message = Marker()
        for receiver in successors:
            send(output_message, receiver)
   elif isinstance(message, Marker) and taken_local_snapshot:
        channels_recorded[sender] = True
   else: 
        if taken_local_snapshot and not channels_recorded[sender]:
           channel_snapshots[sender] = \\ 
                       channel_snapshots[sender].append(message)
The remainder of this webpage consists of examples.

Examples

Example: Snapshots may change a Client's Computation

This example shows that the OS algorithm may change a client's computation though it does not change the client's dataflow.

Figure 2 is a representation of a computation with event sequence \([0, 1, 2, \ldots, ]\) and agents \(X, Y, Z\) without a concurrent OS algorithm, and figure 3 shows how the OS changes this computation. Events later in the computation are placed to the right of earlier events.

Fig2
Fig.2: Representation of a Computation without Snapshots

Figure 3 shows how a client's computation is changed when the OS takes snapshots. The local snapshots taken by agents are shown as a yellow circle on the agents' timelines. The OS delays event 3 so that it occurs after events 4, 5, 6, and 7, as shown in the figure. The OS changes the computation, but it does not change the dataflow.

Fig2
Fig.3: The OS changes a Client's Computation
In figure 3, the pre-snapshot events are 0, 1, 2, 4, 6. There is only one message received in a pre-snapshot event, namely the message represented by the edge (0, 2). So, every message received in a pre-snapshot event is sent in a pre-snapshot event. The figure shows that the set of pre-snapshot events is closed.

Example: Steps in a Global Snapshot Algorithm: Initiation

Figure 4 illustrates the first step of the algorithm.
Fig4
Fig.4: Agent Sends Markers when it Takes its Local Snapshot
Agent Y takes its local snapshot shown as a yellow vertex on Y's timeline. When Y takes its snapshot it sends markers on its output channels. The markers are shown as green edges in the figure.

When agents X and Z each receive the markers, they take their local snapshots because they haven't taken snapshots earlier.

Fig5
Fig.5: Agents Take Local Snapshots when they Receive Markers
The actions by X and Z of taking their snapshots are shown as yellow vertices on their timelines in figure 5.

Example: Agents take Snapshots upon Receiving Markers

When X and Z take their snapshots they send markers out on their output channels. The markers sent by X are shown in figure 7. The markers sent by Z are not shown in the figure.

Fig6
Fig.6: When an Agent takes its Snapshot it sends Markers.

Example: Snapshot of a Channel

Figure 8 shows how agent Y determines the state of the channel from X to Y in the global snapshot. Y starts recording the messages it receives along this channel after Y takes its snapshot and stops the recording when it receives a marker on this channel The only message in this interval is the message corresponding to edge (6, 7).

Fig7
Fig.7: Example: Recording a Channel State
The message corresponding to edge \((0, 2)\) is from X to Y but is not in the snapshot of the channel because both \(0\) and \(2\) are pre-snapshot events. Likewise, the message corresponding to edge \((12, 13)\) is from X to Y but is not in the snapshot of the channel because both \(12\) and \(13\) are post-snapshot events. The message corresponding to edge \((6, 7)\) was sent in a pre-snapshot event and received in a post-snapshot event, and so it is in the snapshot of the channel.

Next

Next logical clocks.

K. Mani Chandy, Emeritus Simon Ramo Professor, California Institute of Technology