READING LIST AND BIBLIOGRAPHY
14.1 SUGGESTIONS FOR FURTHER READING
14.1.1 Introduction and General Works
Coulouris et al.,Distributed Systems-Concepts and Design
A good general text on distributed systems. Its coverage is similar to the material found in this book, but is organized completely different. There is much material on distributed transactions, along with some older material on distributed shared memory systems.
Foster and Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure This is the second edition of a book in which many Grid experts highlight various issues of large-scale Grid computing. The book covers all the important topics, including many examples on current and future applications.
623
624 READING LIST AND BIBLIOGRAPHY CHAP. 14 Neuman, "Scale in Distributed Systems"
One of the few papers that provides a systematic overview on the issue of scale in distributed systems. It takes a look at caching, replication, and distribution as scaling techniques, and provides a number of rule-of-thumbs to applying these techniques for designing large-scale systems.
Silberschatz et aI.,Applied Operating System Concepts
A general textbook on operating systems including material on distributed systems with an emphasis on file systems and distributed coordination.
Verissimo and Rodrigues, Distributed Systems for Systems Architects
An advanced reading on distributed systems, basically covering the same material as in this book. Relatively more emphasis is put on fault tolerance and real-time distributed systems. Attention is also paid to management of distributed systems.
Zhao and Guibas, Wireless Sensor Networks
Many books on (wireless) sensor networks describe these systems from a net- working approach. This book takes a more systems perspective, which makes it an attractive read for those interested in distributed systems. The book gives a good coverage of wireless sensor networks.
14.1.2 Architecture
Babaoglu et al.,Self-star Properties in Complex Information Systems
Much has been said about self-* systems, but not always with the degree of substance that would be preferred. This book contains a collection of papers from authors with a variety of backgrounds that consider how self-* aspects, find their way into modem computer systems.
Bass et aI.,Software Architecture in Practice
This widely used book gives an excellent practical introduction and overview on software architecture. Although the focus is not specifically toward distributed systems, it provides an excellent basis for understanding the various ways that complex software systems can be organized.
Hellerstein et aI.,Feedback Control of Computing Systems
For those readers with some mathematical background, this book provides a thorough treatment on how feedback control loops can be applied to (distributed) computer systems. As such, it forms an alternative basis for much of the research on self-* and autonomic computing systems.
SEC. 14.1 SUGGESTIONS FOR FURTHER READING 625 Lua et al., "A Survey and Comparison of Peer-to-Peer Overlay Network Schemes"
An excellent survey of modem peer-to-peer systems, covering structured as well as unstructured networks. This paper forms a good introduction for those wanting to get deeper into the subject but do not really know where to start.
Oram, Peer-to-Peer: Harnessing the Power of Disruptive Technologies
This book bundles a number of papers on the first generation of peer-to-peer networks. It covers various projects as well as important issues such as security, trust, and accountability. Despite the fact that peer-to-peer technology has made a lot of progress, this book is still valuable for understanding many. of the basic issues that needed to be addressed.
White et aI., "An Architectural Approach to Autonomic Computing"
Written by the technical people behind the idea of autonomic computing, this short paper gives a high-level overview of the requirements that need to be met for self-* systems.
14.1.3 Processes
Andrews, Foundations of Multithreaded, Parallel, and Distributed Programming If you ever need a thorough introduction to programming parallel and distri- buted systems, this is the book to look for.
Lewis and Berg, Multithreaded Programming with Pthreads
Pthreads form the PoSIX standard for implementing threads for operating systems and are widely supported by UNIX-based systems. Although the authors concentrate on Pthreads, this book provides a good introduction to thread pro- gramming in general. As such, it forms a solid basis for developing multithreaded clients and servers.
Schmidt et aI., Pattern-Oriented Software Architecture-Patterns for Concurrent and Networked Objects
Researchers have also looked at common design patterns in distributed sys- tems. These patterns can ease the development of distributed systems as they allow programmers to concentrate more on system-specific issues. In this book, design patterns are discussed for service access, event handling, synchronization, and concurrency.
Smith and Nair, Virtual Machines: Versatile Platforms for Systems and Processes These authors have also published a brief overview of virtualization in the May 2005 issue of Computer, but this book goes into many of the (often intricate)
626 READING LIST AND BffiLIOGRAPHY CHAP. 14
details of virtual machines. As we have mentioned in the text, virtual machines are becoming increasingly important for distributed systems. This book forms an excellent introduction into the subject.
Stevens and Rago,Advanced Programming in the UNIX Environment
If there is ever a need to purchase a single volume on programming on UNIX systems, this is the book to consider. Like other books written by the late Richard Stevens, this volume contains a wealth of detailed information on how to develop servers and other types of programs. This second edition has been extended by Rago, who is also well known for books on similar topics.
14.1.4 Communication
Birrell and Nelson, "Implementing Remote Procedure Calls"
A classical paper on the design and implementation of one of the first remote procedure call systems.
Hohpe and Woolf, Enterprise Integration Patterns
Like other material on design patterns this book provides high-level over- views on how to construct messaging solutions. The book forms an excellent read for those wanting to design message-oriented solutions, and covers a wealth of patterns that can be followed during the design phase.
Peterson and Davie, Computer Networks, A Systems Approach
An alternative textbook to computer networks which takes a somewhat simi- lar approach as this book by considering a number of principles and how they apply to networking.
Steinmetz and Nahrstedt, Multimedia Systems
A good textbook (although poorly copyedited) covering many aspects of (dis- tributed) systems for multimedia processing, together forming a fine introduction into the subject.
14.1.5 Naming
Albitz and Liu,DNS and BIND
BIND is a publicly available and widely-used implementation of a DNS server. In this book, all the details are discussed on setting up a DNS domain using BIND. As such, it provides a lot of practical information on the largest dis- tributed naming service in use today.
SEC. 14.1 SUGGESTIONS FOR FURTHER READING 627
Balakrishnan et aI., "Looking up Data in P2P Systems"
An easy-to-read and good introduction into lookup mechanisms in peer-to- peer systems. Only a few details are provided on the actual working of these mechanisms, but forming a good starting-point for further reading.
Balakrishnan et aI., "A Layered Naming Architecture for the Internet"
In this paper, the authors argue to combine structured naming with flat nam- ing, thereby distinguishing three different levels: (1) human-friendly names which are to be mapped to service identifiers, (2) the service identifiers which are to be mapped to end point identifiers that uniquely identify a host, and (3) the end points that are to be mapped to network addresses. Of course, for those parts that only identifiers are used, one can conveniently use a DHT-based system.
Loshin, Big Book of Lightweight Directory Access Protocol (LDAP) RFCs
LDAP-based systems are widely used in distributed systems. The ultimate source for LDAP services are the RFCs as published by the JETF. Loshin has col- lected all the relevant ones in a single volume, making it the comprehensive source for designing and implementing LDAP services.
Needham, "Names"
An easy-to-read and excellent article on the role of names in distributed sys- tems. Emphasis is on naming systems as discussed in Section 5.3, using DEC's GNS as an example.
Pitoura and Samaras, "Locating Objects in Mobile Computing"
This article can be used as a comprehensive introduction to location services.
The authors discuss various kinds of location services, including those used in telecommunications systems. The article has an extensive list of references that can be used as starting point for further reading.
Saltzer, "Naming and Binding Objects"
Although written in 1978 and focused on nondistributed systems, this paper should be the starting point for any research on naming. The author provides an excellent treatment on the relation between names and objects, and, in particular, what it takes to resolve a name to a referenced object. Separate attention is paid to the concept of closure mechanisms.
14.1.6 Synchronization
Guerraoui and Rodrigues,Introduction to Reliable Distributed Programming
A somewhat misleading title for a book that largely concentrates on distri- buted algorithms that achieve reliability. The book has accompanying software that allows many of the theoretical descriptions to be tested in practice.
628 READING LIST AND BIBLIOGRAPHY CHAP. ]4 Lynch, Distributed Algorithms
Using a single framework, the book describes many different kinds of distri- buted algorithms. Three different timing models are considered: simple synchro- nous models, asynchronous models without any timing assumptions, and partially synchronous models, which come close to real systems. Once you get used to the theoretical notation, you will find this book containing many useful algorithms.
Raynal and Singhal, "Logical Time: Capturing Causality in Distributed Systems"
This paper describes in relatively simple terms three types of logical clocks:
scalar time (i.e., Lamport timestamps), vector time, and matrix time. In addition, the paper describes various implementations that have been used in a number of practical and experimental distributed systems.
Tel, Introduction to Distributed Algorithms
An alternative introductory textbook for distributed algorithms, which concen- trates solely on solutions for message-passing systems. Although quite theoretical, in many cases the reader can quite easily construct solutions for real systems.
14.1.7 Consistency and Replication
Adve and Gharachorloo, "Shared Memory Consistency Models: A Tutorial"
Until recently, there have been many groups developing distributed systems in which the physically dispersed memories where joined together into a single vir- tual address space, leading to what are known as distributed shared memory sys- tems. Various memory consistency models have been designed for these systems and form the basis for the models discussed in Chap. 7. This paper provides an excellent introduction into these memory consistency models.
Gray et aI., "The Dangers of Replication and a Solution"
The paper discusses the trade-off between replication implementing sequen- tial consistency models (called eager replication) and lazy replication. Both forms of replication are formulated for transactions. The problem with eager replication is its poor scalability, whereas lazy replication may easily lead to difficult or impossible conflict resolutions. The authors propose a hybrid scheme.
Saito and Shapiro, "Optimistic Replication"
The presents presents a taxonomy of optimistic replication algorithms as used for weak consistency models. It describes an alternative way of looking at replica- tion and its associated consistency protocols. An interesting issue is the discussion on scalability of various solutions. The paper also includes a large number of use- ful references.
SEC. 14.1 SUGGESTIONS FOR FURTHER READING 629 Sivasubramanian et aI., "Replication for Web Hosting Systems"
In this paper, the authors discuss the many aspects that need to be addressed to handle replication for Web hosting systems, including replica placement, con- sistency protocols, and routing requests to the best replica. The paper also includes an extensive list of relevant material.
Wiesmann et aI., "Understanding Replication in Databases and Distributed Sys- tems"
Traditionally, there has been a difference between dealing with replication in distributed databases and in general-purpose distributed systems. In databases, the main reason for replication used to be to improve performance. In general-purpose distributed, replication has often been done for improving fault tolerance. The papers presents a framework that allows solutions from these two areas to be more easily compared.
14.1.8 Fault Tolerance
Marcus and Stern, Blueprints for High Availability
There are many issues to be considered when developing (distributed) systems for high availability. The authors of this book take a pragmatic approach and touch upon many of the technical and nontechnical issues.
Birman, Reliable Distributed Systems
Written by an authority in the field, this book contains a wealth of information on the pitfalls of developing highly dependable distributed systems. The author provides many examples from academia and industry to illustrate what can go wrong and what can be done about it. The covers a wide variety of topics, includ- ing client/server computing, Web services, object-based systems (CORBA), and also peer-to-peer systems.
Cristian and Fetzer, The Timed Asynchronous Distributed System Model"
The paper discusses a more realistic model for distributed systems other than the pure synchronous or asynchronous cases. Two important assumptions are that services are complete within a specific time interval, and that communication is unreliable and subject to performance failures. The paper demonstrates the appli- cability of this model for capturing important properties of real distributed sys- tems.
Guerraoui and Schiper, "Software-Based Replication for Fault Tolerance"
A brief and clear overview on how replication in distributed systems can be applied for improving fault tolerance. Discusses primary-backup replication as well as active replication, and relates replication to group communication.
630 READING LIST AND BIBLIOGRAPHY CHAP. 14 Jalote, Fault Tolerance in Distributed Systems
One of the few textbooks entirely directed toward fault tolerance in distri- buted systems. The book covers reliable broadcasting, recovery, replication, and process resilience. There is a separate chapter on software design faults.
14.1.9 Security
Anderson, Security Engineering: A Guide to Building Dependable Distributed Systems
One of the very few books that successfully aims at covering the whole secu- rity area. The book discusses the basics such as passwords, access control, and cryptography. Security is tightly coupled to application domains, and security in several domains is discussed: the military, banking, medical systems, among oth- ers. Finally, social, organizational, and political aspects are discussed as well. A great starting point for further reading and research.
Bishop, Computer Security: Art and Science
Although this book is not specifically written for distributed systems, it con- tains a wealth of information of general issues for computer security, including many of the topics discussed in Chap. 9. Furthermore, there is material on security policies, assurance. evaluation, and many implementation issues.
Blaze et al, "The Role of Trust Management in Distributed Systems Security"
The paper argues that large-scale distributed systems should be able to grant access to a resource using a simpler approach than current ones. In particular, if the set of credentials accompanying a request is known to comply with a local security policy, the request should be granted. In other words, authorization should take place without separating authentication and access control. The paper explains this model and shows how it can be implemented.
Kaufman et al.,Network Security
This authoritative and frequently witty book is the first place to look for an introduction to network security. Secret and public key algorithms and protocols, message hashes, authentication, Kerberos, and e-mail are all explained at length.
The best parts are the interauthor (and even intra-author) discussions, labeled by subscripts, as in: "12could not get mel to be very specific ... "
Menezes at al., Handbook of Applied Cryptography
The title says it all. The book provides the necessary mathematical back- ground to understand the many different cryptographic solutions for encryption, hashing, and so on. Separate chapters are devoted to authentication, digital signa- tures, key establishment, and key management.
SEC. 14.1 SUGGESTIONS FOR FURTHER READING 631 Rafaeli and Hutchison,A Survey of Key Management for Secure Group Communi- cation
The title says it all. The authors discuss various schemes that can be used in those systems where process groups need to communicate and interact in a secure way. The paper concentrates on the means to manage and distribute keys.
Schneier, Secrets and Lies
By the same author asApplied Cryptography, this book focuses on explaining security issues for nontechnical people. An important observation is that security is not just a technological issue. In fact, what can be learned from reading this book is that perhaps most of the security-related risks have to do with humans and the way we organize things. As such, it supplements much of the material we presented in Chap. 8.
14.1.10 Distributed Object-Based Systems
Emmerich, Engineering Distributed Objects
An excellent book devoted entirely to remote-object technology, paying specific attention to CORBA, DCOM, and Java RMI. As such, it provides a good basis for comparing these three popular object models. In addition, material is presented on designing systems using remote objects, handling different forms of communication, locating objects, persistence, transactions, and security.
Fleury and Reverbel, "The JBoss Extensible Server"
Many Web applications are based on the JBoss J2EE object server. In this paper, the original developers of that server outline the underlying principles and general design.
Henning, "The Rise and Fall of CORBA"
Written by an expert on CORBA development (but who has come to other insights), this article contains strong arguments against the use of CORBA. Most salient is the fact that Henning believes that CORBA is simply too complex and that it does not make the lives of developers of distributed systems any easier.
Henning and Vinoski,Advanced CORBA Programming with C++
If you need material on programming CORBA, and in the meantime learning a lot on what CORBA means in practice, this book will be your choice. Written by two people involved in specifying and developing CORBA systems, the book is full of practical and technical details without being limited to to a specific CORBA implementation.
632 READING LIST AND BIBLIOGRAPHY CHAP. 14
14.1.11 Distributed File Systems
Blanco et aI., "A Survey of Data Management in Peer-to-Peer Systems"
An extensive survey, covering many important peer-to-peer systems. The authors describe data management issues including data integration, query pro- cessing, and data consistency. Pate, UNIX Filesystems: Evolution, Design, and
Implementation '
This book describes many of the filesystems that have been developed for UNIX systems, but also contains a separate chapter on distributed file systems. It gives an overview of the various NFS versions, as well as filesystems for server clusters.
Satyanarayanan, "The Evolution of Coda"
Coda is an important distributed files system for supporting mobile users. In particular, it has advanced features for supporting what are known as discon- nected operations, by which a user can continue to work on his own set of files without having contact with the main servers. This article describes how the sys- tem has evolved over the years as new requirements surfaced.
Zhu et aI., "Hibernator: Helping Disk Arrays Sleep through the Winter"
Data centers use an incredible number of disks to get their work done. Obvi- ously, this requires a vast amount of energy. This paper describes various tech- niqueshow energy consumption can be brought down by, for example, distin- guishing hot data from data that is not accessed so often.
14.1.12 Distributed Web-Based Systems
Alonso et al., Web Services: Concepts, Architectures and Applications
The popularity and intricacy of Web services has led to an endless stream of documents, too many that can be characterized only as garbage. In contrast, this is one of those very few books that gives a crystal-clear description of what Web services are all about. Highly recommended as an introduction to the novice, an overview for those who have read too much of the garbage, and an example for those producing the garbage.
Chappell, Understanding .NET
The approach that Microsoft has taken to support the development of Web services, is to combine many of their existing techniques into a single framework, along with adding a number of new features. The result is called .NET. This approach has caused much confusion on what this framework actually is. David Chappell does a good job of explaining matters.