Using Poncho to deal with private clouds

April 28, 2014

Clouds have become a new way of providing computational, storage, and networking capabilities to users on demand. Complementing the use of public cloud providers, such as Amazon EC2, is the emergence of private clouds whose participants all belong to the same organization.

One might think that managing resources within a private cloud would be far easier than within a public cloud. Often, however, the opposite is true. A principal reason is the lack of a structured interface between cloud users and cloud operators, resulting in poor coordination between the two. For example, operators cannot determine which resources are needed for higher-priority vs. lower-priority tasks.

System administrators operating the Magellan cluster at Argonne National Laboratory have faced just these issues. To improve the communication between users and operators, therefore, they have developed and implemented Poncho. Through the use of application programming interfaces, or APIs, Poncho enables communication between cloud operators and their users.

“Poncho is our attempt to help us avoid a downpour of requests for services and resources with no knowledge of their relative importance.” said Narayan Desai, a principal experimental systems engineer in Argonne’s Mathematics and Computer Science Division. “It provides a conduit for users to provide us key information, so that we can better meet user needs.”

For example, the system operator may need to do maintenance or make an update or add a security patch. Such an action could be performed when a resource is free; but in a highly utilized cluster, it would be better to terminate a resource that is less important or not needed immediately. To this end, the Magellan team has developed an annotation method through which users can describe the impact of service actions and conditions where an action will have acceptable impact on the user workload. An example of such an annotation is “Instance X can be rebooted during the interval between 10PM and 2AM.”

Poncho also includes a notification function through which users are told when their resources are affected by failures, resource contention, or administrative operations. Users can register to get notification of events such as “A reboot has been scheduled for xxx time.”

“APIs have long been used to provide services to users, but they can just as easily be used to provide services to operators,” said Desai.  “This two-way approach enables users to communicate requirements and expectations to cloud operators unambiguously and allows system operators to reclaim resources and take other actions while minimizing user impact,” said Desai.

Initial feedback from Magellan’s largest users has been positive. For example, one who has a throughput-dominated workload is using Poncho to formalize what had previously been only informal arrangements. Interactive users who consume resources  in a “bursty” fashion have indicated benefit from the added communication enabled by Poncho. The reactions suggest that users are willing to make use  of such interfaces if they make life easier or enable more efficient use of computing resources.

 

For further information, see the paper:

S. Devoid, N. Desai, and L. Hochstein, “Poncho: Enabling Smart Administration of Full Private Clouds,” in Proceedings of the 27th international conference on Large Installation System Administration (LISA’13), USENIX Association, Berkeley, CA, pp.  17-26