DistributedSystemModels/DistributedSystemModels.html

Model Limitations

This page discusses limitations of the model and introduces other models that are used later in the course.

The model given in the previous pages has limitations. We describe the limitations and show how they are managed in models introduced later in the course.

Fairness and Progress

In the model, a computation progresses by delivering a message on any channel. This allows for infinite computations in which some messages remain in a channel forever.

Consider the example of two agents sending tokens to each other. Each agent sends a token that it holds to the other agent. So there is a computation in which tokens go back and forth between the agents forever. Let's call this distributed system D.

Now, consider a system consisting of two identical, and totally independent, copies of D. There is no connection between the two copies. Surely, the behavior of D shouldn't change because of the presence of a completely independent network of agents. But, with our model, it does.

A computation progresses by delivering a message from any nonempty channel. So, there is an infinite computation in which messages are delivered in channels in one copy of D, and no messages are delivered from nonempty channels in the other copy

The problem is that the selection of the nonempty channel in each iteration of the while loop may be unfair -- the same set of channels could be selected forever while other nonempty channels are never selected. The model has no provision for ensuring that a message in a channel will be delivered eventually. We will introduce fair selection, eventuality, and temporal logic later in the course.

No Construct for Time

The only representation of time in our model is that some events occur after others. An event in which a message is received occurs after an event in which that message is sent. Time plays a critical role in the performance of algorithms even though we never use time in proving the correctness of algorithms.

There are many algorithms in which an agent sends itself a Timeout(T) message where ideally the message would be received approximately T seconds after it is sent. Though we never use T in proving the correctness of algorithms we will use timeout messages in analyzing their performance.

Agents can maintain accurate clocks by using atomic clocks, Precision Time Protocols (PTP) in local area networks, and Network Time Protocol (NTP) servers. Accurate clocks have the property that the time at which an event is sent, as determined by the sender's clock, is (almost always) earlier than the time of the event in which the message is received. We do not, however, rule out the possibility that clocks drift apart so that the sender's clock is far ahead of the receiver's.

Some algorithms for systems with perfect clocks are simpler than those with imperfect clocks, as we shall see.

Start Up, Shutdown, Failure

The model assumes that all agents and channels are initialized and then agents start receiving messages. The model has a barrier between the point at which initialization takes place and the point at which messages are delivered. The barrier isn't necessary in most algorithms, though it assuming its existence helps us to focus on more important parts of the algorithm.

The model does not specify how termination is detected if the computation does terminate. Nor does the notation have primitives for shutting down agents and channels gracefully so that they don't continue to hold resources after computation has terminated. Protocols such as AMQP do have primitives for starting up and shutting down distributed systems, but we won't discuss them here.

We will describe algorithms that execute on faulty systems in which messages may be lost, duplicated, or delivered out of order, and where agents may stop forever or halt and restart. We also describe algorithms with Byzantine agents. These algorithms are based on models that are different from those given so far.

Discrete State Space

The state space is discrete in most of the algorithms described here. The state space of some distributed systems has both discrete and continuous components. The state space of a fleet of drones has continuous components.

Later, we study algorithms in which the state space is continuous. Systems with continuous state spaces may have discrete or continuous state transitions.

Summary

Models and notations in this course are not comprehensive -- they do not capture most aspects of distributed systems. We use a model that is appropriate for describing the algorithm at hand.

Properties of Dataflow

Events and Dataflow: FAQ