Subject: File replication catalog requirements for LHC grid systems Date: Thu, 22 Mar 2001 00:23:02 -0800 (PST) From: Koen Holtman Reply-To: Koen Holtman Hi all, This is a first draft of file replica catalog requirements, recently requested by Bill Allcock as input for future Globus/EU datagrid discussions. Note the status of the document.... Cheers, Koen. File replication catalog requirements for LHC grid systems ---------------------------------------------------------- Koen Holtman, 22 Mar 2001 ** Overview These are numerical requirements estimates for the file replica catalog system operating in an LHC data grid. There are large error bars on these estimates, they could easily be off by factors of 10-100. However these estimates are probably as good as anybody can make them now. ** Status of this document Working document, initial draft. Currently these are preliminary CMS estimates. To make these numbers part of official requirements, they need to go though some review cycles in the LHC experiments. I do not currently expect that such a review will lead to significant change. The discussion of catalog availability requirements (allowed failure rates, downtimes) is planned for a next revision of this document. Feedback is appreciated. Are additional numerical parameters needed? ** Introductory notes The terminology used here is that of the Globus replica catalog documentation [REPLCAT]. These estimates are for numerical requirements on the file replica catalog system operating inside the production data grid system of an LHC experiment. This document makes no assumptions about the implementation -- distributed or not, partitioned or not -- of the catalog system. It takes the approach of seeing the file replica catalog as a single big box, and counts the operations on that box. Note also that the grid applications may not necessarily interact with the file replica catalog directly, they might be doing object-level requests on another grid component, which then calls the file catalog. Some baseline assumptions are made to reach requirements that reflect the maximum expected load on the catalog. In particular, it is assumed that all files maintained by all collaborators in the experiment are in the catalog (unlikely to be true in practice) and also that every executable that is run will look up every file it opens in the catalog first. It is also assumed, again to reach a maximum expected load, that whenever some data about a logical file is looked up in the catalog, this is done in a single operation, a separate interaction with the catalog system, concerning only that logical file. The requirements below are therefore that the catalog is fast enough even without the speedups that could come from multi-file aggregation in operations. All operation rates below are rates of single-file operations. In practice catalog users might be able to aggregate some of these operations into multi-file operations if a multi-file operation API is provided, but such aggregation cannot be assumed to occur so frequently that it will `save' a catalog system that is too slow to support the single-file operation rates specified below. To obtain the numbers below, the following strategy is used: the use of files in BaBar is considered, then this is scaled up to an LHC experiment. Thanks go to Heinz Stockinger, Andy Hanushevsky, and Anders Ryd for feedback on these numbers. ** Numerical requirements Four cases are considered, the central one for this document is case 3. * Interpretation of the numbers The numbers below are the expected replica catalog usage characteristics in four different grid scenarios, the expected actual usage under the assumptions about maximum use of the replica catalog above. The numbers below do not have a safety factor built in, for example the requirement to store N files means that the expectation really is that about N files will actually be stored, not that we expect to store much less files but want to be on the safe side. Operation rates are given as `peak rate of operations to be completed per minute'. The catalog system should be able to complete at least this many operations in one minute. The initiation of operations by catalog users has a stochastic, bursty pattern however. The catalog should deal somewhat gracefully with such bursts, that is with short period in which the initiation rates are significantly above the specified peak completion rate. Examples of gracefully dealing with a burst are to queue the requests internally or to tell users to wait and retry. In the worst case of burstiness all catalog users will initiate a catalog operation at exactly the same point in time. The number N of catalog users is therefore also specified below. The catalog service _is_ allowed to crash in the case that more than some X (X<=N) catalog users initiate a catalog operation at exactly the same time, however this X needs to be so large that the probability that such a burst will ever happen in practice is fairly low, so low that the mean time between crashes due to such bursts is acceptable. (Acceptable mean time between crashes to be specified in a future version of this document). * Error bars, implementation safety factor Note again that there are large error bars on these estimates, they could easily be off by factors of 10-100. So an entirely reasonable safety factor in the the planning of a catalog system implementation is a factor 10 on top of the numbers below. * Case 1: CMS production needs for 2001 Estimates of the file replica catalog workload produced by CMS distributed production efforts in 2001. The replication of production files is done by CMS using the GDMP tool [GDMP] which will interface with a replica catalog. number of logical files in the catalog: 10,000 highest number of replicas per logical file: 10 peak rate of updates to be completed (all as single-file operations): 5/minute peak rate of lookups to be completed (all as single-file operations): 20/minute. time allowed for updates to become visible to new lookups: 60 seconds number of catalog users: 25 * Case 2: BaBar thought experiment If the BaBar collaboration would do all its file accesses in the whole collaboration through a single replica catalog system, this system would see in 2001 [Warning this is a CMS estimate of BaBar!]: number of logical files in the catalog: 1,000,000 - 2,000,000 highest number of replicas per logical file: 1-50 peak rate of updates to be completed (all as single-file operations): 60/minute peak rate of lookups to be completed (all as single-file operations): 60000/minute. time allowed for updates to become visible to new lookups: 5 seconds in local area (the site where the update operation was invoked), 30 seconds in wide area number of catalog users: 1000 * Case 3: single LHC experiment in 2006 Below are scaling factors with respect to case 2, for a single LHC experiment in 2006. number of logical files in the catalog: X20 highest number of replicas per logical file: X10 peak rate of updates to be completed (all as single-file operations): X10 peak rate of lookups to be completed (all as single-file operations): X10 time allowed for updates to become visible to new lookups: same number of catalog users: 10,000 * Case 4: single LHC experiment in 2010 Below are scaling factors for an LHC experiment in 2010, with respect to case 3. number of logical files in the catalog: X10 highest number of replicas per logical file: X2 peak rate of updates to be completed (all as single-file operations): X5 peak rate of lookups to be completed (all as single-file operations): X5 time allowed for updates to become visible to new lookups: same number of catalog users: X2 ** References [REPLCAT] replica catalog documentation is available from the page http://www.globus.org/datagrid/deliverables/ [GDMP] http://cmsdoc.cern.ch/cms/grid/