Consider an example of two agents sending
tokens to each other.
Each agent sends a token that it holds to the other agent.
So there is a computation in which tokens go back and forth between
the agents forever.
Let's call this distributed system D
.
Now, consider a system consisting of two identical, and totally
independent, copies of D
.
There is no connection between the two copies.
Surely, the behavior of D
shouldn't change because of
the presence of a completely independent network of agents.
But, with our model, it does.
A computation progresses by delivering a message from any
nonempty channel.
So, there is an infinite computation in which messages are delivered
in channels in one copy of D
, and no messages are
delivered from nonempty channels in the other copy
The problem is that the selection of the nonempty channel in each iteration of the while loop may be unfair -- the same set of channels could be selected forever while other nonempty channels are never selected. The model has no provision for ensuring that a message in a channel will be delivered eventually. We will introduce fair selection, eventuality, and temporal logic later in the course.
There are many algorithms in which an agent sends itself a Timeout(T)
message where ideally the message would be received approximately T
seconds after it is sent.
Though we never use T
in proving the correctness of algorithms we will use
timeout messages in analyzing their performance.
Agents can maintain accurate clocks by using atomic clocks, Precision Time Protocols (PTP) in local area networks, and Network Time Protocol (NTP) servers. Accurate clocks have the property that the time at which an event is sent, as determined by the sender's clock, is (almost always) earlier than the time of the event in which the message is received. We do not, however, rule out the possibility that clocks drift apart so that the sender's clock is far ahead of the receiver's.
Some algorithms for systems with perfect clocks are simpler than those with imperfect clocks, as we shall see.
The model does not specify how termination is detected if the computation does terminate. Nor does the notation have primitives for shutting down agents and channels gracefully so that they don't continue to hold resources after computation has terminated. Protocols such as AMQP do have primitives for starting up and shutting down distributed systems, but we won't discuss them here.
We will describe algorithms that execute on faulty systems in which messages may be lost, duplicated, or delivered out of order, and where agents may stop forever or halt and restart. We also describe algorithms with Byzantine agents. These algorithms are based on models that are different from those given so far.
Later, we study algorithms in which the state space is continuous. Systems with continuous state spaces may have discrete or continuous state transitions.
Many models deal with channels in which senders are blocked when a channel gets full. Our model has no concept of a channel being full.
Other models allow for multiple channels from an agent \(X\) to an agent \(Y\). In our model, however, there can be at most one channel from \(X\) to \(Y\).
A channel is unidirectional.
It is represented by a directed edge in a directed graph.
Messages can be sent by an agent P
to an agent
Q
along a channel (P, Q)
.
Agent Q
cannot send a message to agent P
along channel (P, Q)
.
A system may have a channel (P, Q)
and may or may not have a channel
(Q, P)
.
Some systems have bidirectional channels in which messages can be sent
in both directions on the same channel.
Our model does not allow for bidirectional channels.
In our model, a channel is directed from exactly one agent to exactly one agent. In some systems, multiple agents can send messages on the same channel, and multiple agents can receive messages on the same channel. Our model does not allow for such channels.
Because queues are FIFO (first-in-first-out), for all \(n\), the \(n\)-th message received on a channel is the \(n\)-th message sent on that channel.
Note that messages sent on different channels may not be delivered in the order sent.
Suppose you want to design an algorithm in which an agent
X
refuses to receive messages from an agent
Z
until it first receives a message from an agent
Y
.
How would you use this model?
In our model, agent X
has to receive the message from
Z
in every state of X
.
Agent X
can copy the message from Z
into a
local queue -- a local variable of X
-- and process the
message only after receiving the message from Y
.
Our model assumes that channels have unlimited capacity. We can represent a situation in which channels have limited capacity in the following way. An agent has a local output queue of unlimited size. The agent puts messages into this queue when it cannot send messages on a channel because the channel is full. Messages from this queue are sent on the channel when the channel stops being full.
A work around for the case in which agents are created or deleted a finite number of times is as follows. The network of agents in the model is made large enough to include agents that have not been created as yet and also include agents that have been deleted.
An agent that has not been created is represented by an agent that is idle, i.e., one is waiting for a message. It never receives a message until the agent is created. The message is created by the operating system sending a "create" message to the agent and informing other agents that this agent has been created.
An agent this is deleted is represented in the same way. The operating system deletes an agent by sending a "delete" message to the agent. A deleted agent discards messages that it receives without taking action.
There are systems in which messages have different priorities, and the arrival of high-priority messages interrupts the execution of low-priority messages. Our model does not allow for interruptions.
Later we describe algorithms with different types of channels. For example, the state of a channel in which messages may overtake each other is a multiset or bag. A message is sent on a channel by adding the message to the multiset. A nonempty multiset may deliver any message in the mutliset to the receiving agent.
Lossy channels, and channels in which messages have priorities, are also modeled by defining channel states appropriately.
There are problems in which we would like to start a distributed
computation P
; wait for P
to finish, and
then start another distributed computation Q
.
We want a barrier between P
and Q
.
A barrier can be implemented in several ways.
An agent, say a coordinator agent, sends messages to all agents to start
P
.
The coordinator then detects termination of P
; we discuss
termination detection algorithms later.
After detecting termination of P
the coordinator sends
messages to all agents to start Q
.
For example, an agent that is computing using a set of files may transition to a state in which it no longer needs those files; this transition occurs without the agent receiving a message. We model this event by having the agent send itself a message when it starts using the set of files, and it transits to a state in which it no longer needs the files when it receives the message from itself.
Representing internal state changes in this way is an artifice; however, the artifice allows us to use a very simple model in which a state change occurs when, and only when, a message is delivered.
We use agents' local clocks for evaluating algorithm performance but not for proving correctness because even a small drift can cause race conditions.
Consider an algorithm in which one agent carries out a computation starting at 1pm and another agent carries out a computation starting at 2pm. When we prove the correctness of our algorithms, we allow for the unlikely possibility that that the agent starting its computation at 2pm does so before the agent that starts a 1pm. For evaluating performance, however, we assume that the agent that starts a 1pm usually does so before the agent that starts at 2pm.
Perhaps the most widely used model is CSP -- Communicating Sequential Processes.
An overview of models used in parallel programming describes models used in shared-memory and distributed computing.
K. Mani Chandy, Emeritus Simon Ramo Professor, California Institute of Technology