This webpage gives some details of the global snapshot algorithm.
Global Snapshot: Details
This webpage gives a code outline for the global snapshot algorithm
and gives examples of steps in the algorithm.
Code Structure
The code outline is given below in Python.
channels_recorded is a dict (dictionary), where
channels_recorded[sender] becomes True when the snapshot for
the channel from this sender has finished being recorded.
channel_snapshots is a dict where
channel_snapshots[sender] is the ongoing recording of the
snapshot of the channel from the sender.
taken_local_snapshot = False
channel_snapshots = {key: [] for key in predecessors}
channels_recorded = {key: False for key in predecessors}
start()
def receive(message, sender):
if isinstance(message, Marker) and not taken_local_snapshot:
local_snapshot = record_state()
taken_local_snapshot = True
channels_recorded[sender] = True
output_message = Marker()
for receiver in successors:
send(output_message, receiver)
elif isinstance(message, Marker) and taken_local_snapshot:
channels_recorded[sender] = True
else:
if taken_local_snapshot and not channels_recorded[sender]:
channel_snapshots[sender] = \\
channel_snapshots[sender].append(message)
The remainder of this webpage consists of examples.
Examples
Example: Snapshots may change a Client's Computation
This example shows that the OS algorithm may change a client's
computation though it does not change the client's dataflow.
Figure 2 is a representation of a computation with event sequence
\([0, 1, 2, \ldots, ]\) and agents \(X, Y, Z\) without a concurrent OS
algorithm, and figure 3 shows how the OS changes this computation.
Events later in the computation are placed to the
right of earlier events.
Fig.2: Representation of a Computation without Snapshots
Figure 3 shows how a client's computation is changed when the OS takes
snapshots.
The local snapshots taken by agents are shown as a yellow
circle on the agents' timelines.
The OS delays event 3 so that it occurs after events 4, 5, 6, and 7,
as shown in the figure.
The OS changes the computation, but it does not change the dataflow.
Fig.3: The OS changes a Client's Computation
In figure 3, the pre-snapshot events are 0, 1, 2, 4, 6.
There is only one message received in a pre-snapshot
event, namely the message represented by the edge (0, 2).
So, every message received in a pre-snapshot event is sent in a
pre-snapshot event.
The figure shows that the set of pre-snapshot events is closed.
Example: Steps in a Global Snapshot
Algorithm: Initiation
Figure 4 illustrates the first step of the algorithm.
Fig.4: Agent Sends Markers when it Takes its Local
Snapshot
Agent Y takes its local snapshot shown as a yellow vertex on Y's
timeline.
When Y takes its snapshot it sends markers on its output channels.
The markers are shown as green edges in the figure.
When agents X and Z each receive the markers, they take their local
snapshots because they haven't taken snapshots earlier.
Fig.5: Agents Take Local Snapshots when they Receive Markers
The actions by X and Z of taking their snapshots are shown as yellow
vertices on their timelines in figure 5.
Example: Agents take Snapshots upon Receiving Markers
When X and Z take their snapshots they send markers out on their
output channels.
The markers sent by X are shown in figure 7.
The markers sent by Z are not shown in the figure.
Fig.6: When an Agent takes its Snapshot it sends Markers.
Example: Snapshot of a Channel
Figure 8 shows how agent Y determines the state of the channel from X
to Y in the global snapshot.
Y starts recording the messages it receives along this channel after Y
takes its snapshot and stops the recording when it receives a marker
on this channel
The only message in this interval is the message corresponding to edge
(6, 7).
Fig.7: Example: Recording a Channel State
The message corresponding to edge \((0, 2)\) is from X to Y but is not
in the snapshot of
the channel because both \(0\) and \(2\) are pre-snapshot events.
Likewise, the message corresponding to edge \((12, 13)\) is from X to
Y but is not in the snapshot of
the channel because both \(12\) and \(13\) are post-snapshot events.
The message corresponding to edge \((6, 7)\) was sent in a
pre-snapshot event and received in a post-snapshot event, and so it is
in the snapshot of the channel.