This page discusses limitations of the model and introduces other models that are used later in the course.
The model given in the previous pages has limitations. We describe the limitations and show how they are managed in models introduced later in the course.
Consider the
example of two agents sending
tokens to each other.
Each agent sends a token that it holds to the other agent.
So there is a computation in which tokens go back and forth between
the agents forever.
Let's call this distributed system D
.
Now, consider a system consisting of two identical, and totally
independent, copies of D
.
There is no connection between the two copies.
Surely, the behavior of D
shouldn't change because of
the presence of a completely independent network of agents.
But, with our model, it does.
A computation progresses by delivering a message from any
nonempty channel.
So, there is an infinite computation in which messages are delivered
in channels in one copy of D
, and no messages are
delivered from nonempty channels in the other copy
The problem is that the selection of the nonempty channel in each iteration of the while loop may be unfair -- the same set of channels could be selected forever while other nonempty channels are never selected. The model has no provision for ensuring that a message in a channel will be delivered eventually. We will introduce fair selection, eventuality, and temporal logic later in the course.
There are many algorithms in which an agent sends itself a Timeout(T)
message where ideally the message would be received approximately T
seconds after it is sent.
Though we never use T
in proving the correctness of algorithms we will use
timeout messages in analyzing their performance.
Agents can maintain accurate clocks by using atomic clocks, Precision Time Protocols (PTP) in local area networks, and Network Time Protocol (NTP) servers. Accurate clocks have the property that the time at which an event is sent, as determined by the sender's clock, is (almost always) earlier than the time of the event in which the message is received. We do not, however, rule out the possibility that clocks drift apart so that the sender's clock is far ahead of the receiver's.
Some algorithms for systems with perfect clocks are simpler than those with imperfect clocks, as we shall see.
The model does not specify how termination is detected if the computation does terminate. Nor does the notation have primitives for shutting down agents and channels gracefully so that they don't continue to hold resources after computation has terminated. Protocols such as AMQP do have primitives for starting up and shutting down distributed systems, but we won't discuss them here.
We will describe algorithms that execute on faulty systems in which messages may be lost, duplicated, or delivered out of order, and where agents may stop forever or halt and restart. We also describe algorithms with Byzantine agents. These algorithms are based on models that are different from those given so far.
Later, we study algorithms in which the state space is continuous. Systems with continuous state spaces may have discrete or continuous state transitions.
K. Mani Chandy, Emeritus Simon Ramo Professor, California Institute of Technology