Publications
Y. Zhao, M. Wilde, and I. Foster, "Applying the Virtual Data Provenance Model," Preprint ANL/MCS-P1355-0606, June 2006. [pdf]
In many domains of science, engineering, and commerce, data analysis systems are employed to derive knowledge from datasets describing experimental results or simulated phenomena. To support such analyses, we have developed a "virtual data system" in which a uniform notation is used to request the invocation of data transformation procedures and to record how every result derived by the system was produced. We maintain such prospective and retrospective information in an integrated schema alongside semantic annotations, and thus enable a powerful query capability in which the rich semantic information implied by knowledge of the structure of data derivative procedures cn be exploited to provide an information environment that fuses recipe, history, and application-specific semantics. We provide here an overview of this integration, the queries and transformations that it provides, and examples of how these capabilities can service the scientific process.
