CMSC 23340 Grid Computing

CMSC 23340/33340 Topics in Grid Computing

U.Chicago, Winter Quarter, 2005
Instructors: Prof. Ian Foster

When and Where: Mon & Wed 4:00-5:20, Ryerson 277 .

Description: Federated distributed systems are collections of Internet-connected autonomous computing nodes spread across administrative domains. Participation in these federated systems allows access to potentially unique or large sets of resources such as data, storage space, computing power, or services. Examples of federated systems include computational grids, peer-to-peer networks, and wide-area testbeds such as PlanetLab. Building such systems involves challenges at multiple levels, from the network (e.g., transport, routing) to the algorithmic (e.g., data distribution, resource management) and even the social (e.g., incentives).

This course is a tour through various research topics in federated distributed systems. We will explore solutions and learn design principles for building large network-based computational systems. Our readings and discussions will help us identify research problems and understand methods and general approaches to design, implement, and evaluate distributed systems. Topics include resource management (discovery, allocation), data management (replication, location), security, fault-tolerance, and system characterization. Our discussions will often be grounded in the context of deployed distributed systems such as Grids and peer-to-peer networks.

The course involves discussions of four papers a week and a project.

Textbook: while most of the papers are available on the Internet, we will also read several chapters from "The Grid2: Blueprint for a New Computing Infrastructure" (Grid2) by Ian Foster and Carl Kesselman. I'll provide copies of this book to those who don't have them.

To subscribe to the class mail list, visit: http://mailman.cs.uchicago.edu/mailman/listinfo/cmsc33340.

Course Format

The course is structured to provide (a) an in-depth understanding of current topics in large-scale, distributed system research, and (b) experience with reviewing and presenting advanced technical material. The class workload has a participation component, an assignment, and a project, as described in the following.

Participation. In each class we discuss two research papers. Before the class, we all read the papers and develop brief answers to a set of standard questions, as follows:

State the main contribution of the paper
Critique the main contribution.
1. Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.
2. Rate how convincing the methodology is. You may consider some of the following questions (use what is relevant): do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have considered? (And, of course, is the paper free of methodological errors?)
3. What is the most important limitation of the approach?
What are the three strongest and/or most interesting ideas in the paper?
What are the three most striking weaknesses in the paper?
Name three questions that you would like to ask the authors.
Detail an interesting extension to the work not mentioned in the future work section.
Optional comments on the paper that you�d like to see discussed in class.

Reviews must be submitted by noon before class by email to instructor. Each paper is discussed in class. Discussions will be led by one or more students and may include a brief (5-minute) presentation of the paper. Discussion leaders do not need to submit reviews, but they need to (a) prepare a discussion plan and email it to instructor (before class); (b) prepare the master critique based on in-class discussion and send it to instructor (due before the following class).

Project. You have two options for the project:

	You can develop a research proposal in an area of distributed computing. This work will involve identifying a problem, reviewing approaches to the problem, proposing a novel approach to the problem, and finally describing how you would proceed to evaluate your approach.
	You can define, design, prototype, and document a non-trivial but simple distributed service using Globus Toolkit version 4 (GT4).

A project in either of these two areas could be developed into something quite substantial: a published paper in the first case and either some useful code or a published paper in the second case, if you carry on and execute the research proposal or complete and evaluate the service, respectively. I welcome people pursuing things to that stage, but this is not required to pass the course.

A list of project ideas will be posted, but students are highly encouraged to propose topics of their own interest.

Milestones:

	January 22: email me a one-page statement of the project you propose to do, noting the problem, proposed approach, and relevant reading. (Meet with me ahead of time to discuss your project plans).
	January 26: in-class presentation and discussion of project plans.
	February 23: submit 5-page summary of the problem you are tackling, the proposed approach, initial results, and expected structure of the final paper (which should be 10-15 pages). In class presentation of preliminary results.
	Final week: submit paper and present project.

Grading. The course will be graded pass/fail. Credit will be assigned as follows. Reports: 25%. Participation: 10%. Discussion lead: 15%. Project: 50%.

Week 1

Jan 3: Introduction to the class, goals, and structure

Background reading on Grid and P2P applications and systems:
a) Scientific Data Federation: The World-Wide Telescope, Szalay and Gray, Grid2-Ch7.
b) Medical Data Federation: The Biomedical Informatics Research Network, Ellisman and Peltier, Grid2-Ch8.
c) Concepts and Architecture, Foster and Kesselman, Grid2-Ch4.
d) Peer-to-Peer Technologies. Crowcroft, J., Moreton, T., Pratt, I. and Twigg, A. Grid2-Ch29.

Jan 5: Data movement [John Bresnahan]

a) The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid, Allen and Wolski, SC'2003.
b) Slurpie: A Cooperative Bulk Data Transfer Protocol, Sherwood et al, InfoComm 2004.
See also:
c) Incentives Build Robustness in BitTorrent, Bram Cohen, 2003.

Week 2

Jan 10: System characterization.

a) Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload, Gummadi et al, SOSP 2003.
b) Understanding Availability, Ranjita Bhagwan, Stefan Savage, Geoffrey M. Voelker, IPTPS 2004.

Jan 12: Availability and monitoring [Ioan Raicu]

a) Total Recall: System Support for Automated Availability Management, Ranjita Bhagwan, Kiran Tati, Yu-Chung Cheng, Stefan Savage, and Geoffrey M. Voelker, NSDI 2004.
b) The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. Massie, Chun, and Culler. Parallel Computing, Vol. 30, Issue 7, July 2004.

Week 3

Jan 17: End-to-end arguments in system design

a) Middleboxes No Longer Considered Harmful, Michael Walfish, Jeremy Stribling, Maxwell Krohn, Hari Balakrishnan, Robert Morris, Scott Shenker , OSDI�04

b) An End-to-End Approach to Globally Scalable Network Storage, Micah Beck, Terry Moore, James S. Plank, SIGCOMM 2002

Further reading:

c) End-to-end arguments in system design, Jerome H. Saltzer, David P. Reed, David D. Clark, ACM Transactions on Computer Systems, 1984.

d) Rethinking the design of the Internet: The end to end arguments vs. the brave new world, D. Clark and M. Blumenthal, Workshop on Policy Implications of End-to-End. 2001.

e) Network Infrastructure, J. Touch and J. Postel, Grid2-Ch30.

Jan 19: Building reliable services

a) Chain Replication for Supporting High Throughput and Availability, Robbert van Renesse and Fred B. Schneider, OSDI 2004.
b) The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP 2003.
Further reading:
c) Building Reliable Clients and Services. Thain, D. and Livny, M. Grid2-Ch16.

Week 4

Jan 24: Data

a) An Approach for Automatic Data Virtualization, Li Weng, Gagan Agrawal, Umit Catalyurek, Tahsin Kurc, Sivaramakrishnan Narayanan, Joel Saltz, HPDC 2004.
b) Privacy-Preserving Data Mining on Data Grids in the Presence of Malicious Participants, Bobi Gilburd, Assaf Schuster, Ran Wolff, HPDC 2004.
Further reading:
c) Data Access, Integration, and Management. Atkinson, M., Chervenak, A., Kunszt, P., Narang, I., Paton, N., Pearson, D., Shoshani, A. and Watson, P. Grid2-Ch22.

Jan 26: In-class presentation of project plans.

Week 5

Jan 31: Transport [Borja Sotomayor]

a) Evaluation of Rate-based Transport Protocols for Lambda-Grids, Ryan Wu, Andrew Chien, HPDC 2004.
b) Differential Serialization for Optimized SOAP Performance, Nayef Abu-Ghazaleh, Michael J. Lewis, Madhusudhan Govindaraju, HPDC 2004.

Feb 2: System management and debugging

a) Model-Based Resource Provisioning in a Web Service Utility, Ronald P. Doyle, Jeffrey S. Chase, Omer M. Asad, Wei Jin, Amin M. Vahdat,

b) Configuration Debugging as Search: Finding the Needle in the Haystack, Andrew Whitaker, Richard S. Cox, and Steven D. Gribble

Week 6

Feb 7: No class. (Although plans may change.)

Feb 9: No class. (Although plans may change.)

Week 7

Feb 14: Security

a) Secure routing for structured peer-to-peer overlay networks, Miguel Castro1, Peter Druschel, Ayalvadi Ganesh1, Antony Rowstron, Dan S. Wallach, OSDI 2002.
b) Automated Worm Fingerprinting, Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage, OSDI 2004.

Further reading:

c) Security for Virtual Organizations: Federating Trust and Policy Domains, Siebenlist, Nagaratnam, et al, Grid2-21

Feb 16: No class. (Although plans may change.)

Week 8

Feb 21: Naming

a) The Design and Implementation of a Next Generation Name Service for the Internet, Venugopalan Ramasubramanian, Emin Gun Sirer, SIGCOMM 2004.

b) Impact of Configuration Errors on DNS Robustness, Vasileios Pappas, Zhiguo Xu, Songwu Lu, Daniel Massey, Andreas Terzis, Lixia Zhang, SIGCOMM 2004.

Further reading:

c) Development of the Domain Name System. Mockapetris, P.V. and Dunlap, K., SIGCOMM, 1988, ACM, 123-133.

Feb 23: Information services

a) Mercury: Supporting Scalable Multi-Attribute Range Queries, Ashwin R. Bharambe, Mukesh Agrawal, Srinivasan Seshan, SIGCOMM�04

b) A Scalable Distributed Information Management System, Praveen Yalagandula, Mike Dahlin, SIGCOMM�04

Week 9

Feb 28: Virtualization

a) Xen and the Art of Virtualization, Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld, SOSP 2003.
b) From Virtualized Resources to Virtual Computing Grids: The In-VIGO System. Adabala, S., Chadha, V., Chawla, P., Figueiredo, R., Fortes, J., Krsul, I., Matsunaga, A., Tsugawa, M., Zhang, J., Zhao, M., Zhu, L. and Zhu, X., Future Generation Computer Systems. 2004.

Mar 2: Virtualization

a) Distributed File System Support for Virtual Machines in Grid Computing, Ming Zhao, Jian Zhang, Renato Figueiredo, HPDC 2004.

b) Towards Virtual Networks for Virtual Machine Grid Computing. A. Sundararaj and P. Dinda. 3rd USENIX Conference on Virtual Machine Technology, 2004.

Week 10:

Mar 7: TBD. (Something interesting that comes up in class.)

Mar 9: Present project reports.