To support distributed physics analysis on a scale as foreseen by the LHC experiments, 'Grid' systems are needed that manage and streamline data distribution, replication, and synchronization. We report on the development of a tool that allows large physics datasets to be managed and replicated at the granularity level of single objects. Efficient and convenient support for data extraction and replication at the level of individual objects and events will enable for types of interactive data analysis that would be too inconvenient or costly to perform with tools that work on a file level only.

Our tool development effort is intended as both a demonstrator project for various types of existing Grid technology, and as a research effort to develop Grid technology further. The basic use case supported by our tool is one in which a physicist repeatedly selects some physics objects located at a central repository, and replicates them to a more local site. The selection can be done using 'tag' or 'ntuple' analysis at the local site. The tool replicates the selected objects, and merges all replicated objects into a single single coherent 'virtual' dataset. This allows all objects to be used together seamlessly, even if they were replicated at different times or
from different locations.

The version of the tool that is reported on in this paper replicates ORCA based physics data created by CMS in its ongoing high level trigger design studies. The basic capabilities and limitations of the tool are discussed, together with some performance results. Some tool internals are also presented. Finally we will report on experiences so far and on future plans.


Back to Program

Back to ACAT Web page