The observation commonly referred to as Amdahl's law was first formulated in . Asymptotic analysis of parallel algorithms is discussed in many computer science texts, such as those by Akl , Leighton , and Smith . Cook  discusses problems for which no efficient parallel algorithms have been discovered.
Many different approaches to performance modeling have been proposed, each appropriate for different purposes. See, for example, the papers by Culler et al. , Eager et al. , Flatt and Kennedy , Karp and Flatt , and Nussbaum and Agarwal . Patel  discusses the modeling of shared-memory computers. The book by Kumar et al.  provides many example models and a more detailed treatment of the concept of isoefficiency. Gustafson et al. [129,130] introduce the concept of scaled speedup. Singh, Hennessy, and Gupta , Sun and Ni , and Worley [297,298] discuss various constraints on the scalability of parallel programs. Lai and Sahni  and Quinn and Deo  discuss speedup anomalies in search problems. Faber et al.  argue against the concept of superlinear speedup. Fromm et al. , Harrison and Patel , and Thomasian and Bay  use queuing models to study performance of parallel systems. Kleinrock  reviews techniques used for performance analysis of networks and discusses issues that arise in high-speed (gigabit/sec) WANs.
The chapter notes in Chapter 1 provide references on parallel computer architecture. Feng  provides a tutorial on interconnection networks. Hypercube networks have been used in a variety of multicomputers such as the Cosmic Cube , nCUBE-2 , Intel iPSC, and Thinking Machines CM2 . The Intel DELTA and Intel Paragon  use two-dimensional mesh networks. The Cray T3D and MIT J machine  use a three-dimensional torus. Adams, Agrawal, and Siegel  survey multistage interconnection networks, and Harrison  discusses the analytic modeling of these networks. Various forms of multistage network have been used in the BBN Butterfly , NYU Ultracomputer , IBM RP3 , and IBM SP . The IBM SP uses a bidirectional multistage network constructed from 4 4 crossbars (a modified 4-ary n -fly) similar to that illustrated in Figure 3.18. Seitz [253,255] provides an introduction to multicomputers and their interconnection networks. Dally  discusses networks and the concept of bisection width, while Leighton  provides a more detailed and theoretical treatment. Dally and Seitz [70,71] discuss routing techniques. The material in Example 3.8 is based on work by Foster and Worley . Ethernet was designed by Metcalfe and Boggs ; Shoch, Dalal, and Redell  describe its evolution.
Miller and Katz , Foster, Henderson, and Stevens , and Pool et al.  discuss the I/O requirements of scientific and engineering applications. Del Rosario and Choudhary  discuss problems and prospects for parallel I/O. Henderson, Nickless, and Stevens  discuss application I/O requirements and describe a flexible I/O architecture for parallel computers. Plank and Li  discuss checkpointing. Bordawekar, del Rosario, and Choudhary  explain the utility of a two-phase I/O strategy. A special issue of the Journal of Parallel and Distributed Computing  discusses various aspects of parallel I/O, as do Aggarwal and Vitter  and Katz, Gibson, and Patterson . DeWitt and Gray  discuss parallel database machines. Gibson  examines the design and performance analysis of redundant disk arrays (RAID disks). Hennessy and Patterson  provide a good description of I/O system performance analysis and design.
The parallel versions of Floyd's shortest-path algorithm  are due to Jenq and Sahni , while the parallel version of Dijkstra's single-source algorithm  is described by Paige and Kruskal . Our analysis of these algorithms follows Kumar and Singh , who also present analyses that take into account bandwidth limitations on hypercube and two-dimensional mesh architectures. Bertsekas and Tsitsiklis  describe a pipelined variant of Floyd 2 that improves performance by allowing iterations to proceed concurrently, subject only to dataflow constraints. Aho, Hopcroft, and Ullman  and Cormen, Leiserson, and Rivest  provide good introductions to sequential graph algorithms. Quinn and Deo  and Das, Deo, and Prasad [73,74] describe parallel graph algorithms. Ranka and Sahni's  book on parallel algorithms for image processing and pattern recognition includes relevant material.
Here is a
providing access to additional information on performance analysis and
the architecture and performance of parallel and distributed computer
© Copyright 1995 by Ian Foster