Goals: The GriPhyN (Grid Physics Network) project brings together an outstanding team of information technology (IT) researchers and experimental physicists to provide the IT advances required to enable Petabyte-scale data intensive science in the 21st century. Driving the project are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, which arise initially from the four physics experiments involved in this project but will also be fundamental to science and commerce in the 21st century, the GriPhyN team will pursue IT advances centered on the creation of Petascale Virtual Data Grids (PVDG) that meet the dataintensive computational needs of a diverse community of thousands of scientists spread across the globe. The iVDGL (international Virtual Data Grid Laboratory) is tasked with establishing and utilizing an international Virtual- Data Grid Laboratory (iVDGL) of unprecedented scale and scope, comprising heterogeneous computing and storage resources in the U.S., Europe and ultimately other regions linked by high-speed networks, and operated as a single system for the purposes of interdisciplinary experimentation in Grid-enabled, data-intensive scientific computing. Our goal in establishing this laboratory is to drive the development, and transition to every day production use, of Petabyte-scale virtual data applications required by frontier computationally oriented science. In so doing, we seize the opportunity presented by a convergence of rapid advances in networking, information technology, Data Grid software tools, and application sciences, as well as substantial investments in data-intensive science now underway in the U.S., Europe, and Asia.
Significance: The data analysis for these experiments presents enormous IT challenges. Communities of thousands of scientists, distributed globally and served by networks of varying bandwidths, need to extract small signals from enormous backgrounds via computationally demanding analyses of datasets that will grow from the 100 Terabyte to the 100 Petabyte scale over the next decade. The computing and storage resources required will be distributed, for both technical and strategic reasons, across national centers, regional centers, university computing centers, and individual desktops. The scale of this task, far outpaces our current ability to manage and process data in a distributed environment, requiring fundamental advances in many areas of computer science.
Accomplishments: To meet these challenges, GriPhyN and iVDGL will pursue an aggressive program of fundamental IT research focused on realizing the concept of Virtual Data. Virtual Data encompasses the definition and delivery to a large community of a (potentially unlimited) virtual space of data products derived from experimental data. In this virtual data space, requests can be satisfied via direct access and/ or computation, with local and global resource management, policy, and security constraints determining the strategy used. Overcoming this challenge and realizing the Virtual Data concept requires advances in three major areas in which GriPhyN will target IT advances: Virtual data technologies, policy-driven request planning and scheduling of networked data and computational resources, and management of transactions and task-execution across national-scale and worldwide virtual organizations.