Abstracts for LCCWS2

Abstracts for LCCWS2

BaBar Computing (Stephen J. Gowdy, SLAC)
BaBar has been taking data since 1999 and has accumulated almost 100 million BBbar decays (82fb-1 on the Upsilon(4s) peak). In order to process and analyse this data BaBar relies primarily on computing clusters located in four "Tier-A" sites around the world. This talk will discuss how BaBar uses these sites and issues related to a largely distributed collaboration.

CDF forRun II (Frank Wuerthwein, FNAL)
Run II at the Fermilab Tevatron Collider began in March 2001 and will continue to probe the high energy frontier in particle physics until the start of the LHC at CERN. It is expected that the CDF collaboration will store up to 10 Petabytes of data onto tape by the end of Run II. Providing efficient access to such a large volume of data for analysis by hundreds of collaborators world-wide will require new ways of thinking about computing in particle physics research. In this talk, I discuss the computing model at CDF designed to address the physics needs of the collaboration. Particular emphasis is placed on current development of a O(1000) node PC cluster at Fermilab serving as the Central Analysis Facility for CDF and the vision for incorporating this into a decentralized (GRID-like) framework.

D0 and SAM for Run II (Lee Lueking, FNAL)
SAM has been developed within the Computing Division at Fermilab as a versatile, distributed, data management system. One of its many features is its ability to control processing and manage a distributed cache within a cluster of compute servers. Requirements, concepts, and features of this system are described and issues involved in interfacing it to several batch systems are discussed. The system is used within the
Dzero experimental collaboration to distribute hundreds of Terabytes of data for processing and analysis around the world. Several hardware configurations deployed at Fermilab are described. Data is currently disseminated using this system to over two dozen sites worldwide, and this number will grow to nearly one hundred in the coming years. The planned design evolution to accommodate this growth is discussed, and the transition of the system to grid standard middleware is described.

Managing a mature white box cluster at CERN (Tim Smith, CERN)
A large and diverse mutli-vendor white box farm built up over the years poses challenges which are uncharacteristic in new single acquisition large farms. The combination of diversity and scale prevent the applicability of either of the extremes of 'vendor supplied' or 'hand-crafted' solutions to problems encountered. Vendor interventions are frequent as are acquisition cycles, which feeds a constant in- and out-flow of machines, implicating a large number of people. Additionally software provision of management tools has been traditionally distributed over multiple teams implicating a large and diverse number of application delivery mechanisms. I will describe how CERN is facing these challenges whilst reducing the number of hands on the systems, through automation, standards, procedures, and the adoption of workflows to manage machine life-cycles.

University Multidisciplinary Scientific Computing: Experience and Plans (Alan Tackett, Vanderbilt University)
The significant investment made in information technology research over the past two decades has been justified in large part by the desire to enable great science through the "transformative power of computational resources." This investment is having the desired effect, and high end cyber infrastructure is becoming essential for researchers in many disciplines.

At Vanderbilt, researchers in genetics, structural biology, and high energy physics who felt that their rate of discovery was constrained by the paucity of available computational resources have established a campus scientific computing center. This grass roots initiative has been a signficant success, and new disciplines are joining. We will describe the organizational and resource allocation models for this center, our recent experience, and plans for the future.

Building a Computer Centre (Tony Cass, CERN)
Although it looks simple, many sites find that they do not have the necessary infrastructure to support a large number of PCs. In particular, the importance of ensuring adequate air conditioning, and not just adequate floor space, is often overlooked. The speaker will review such issues based on experience gained in planning the upgrade of CERN's physical infrastructure to support the needs of LHC computing.

A Pre-Production Update on the NSF TeraGrid (Remy Evard, ANL)
The TeraGrid Project is the National Science Foundation's next-generation infrastructure in support of scientific computation. When completed, the TeraGrid will include 13.6 teraflops of Linux Cluster computing power distributed at the four TeraGrid sites, facilities capable of managing and storing more than 450 terabytes of data, high-resolution visualization environments, and toolkits for grid computing.

While the TeraGrid will soon be expanded to include other sites, Argonne National Laboratory is one of the four initial TeraGrid participants, along with the National Center for Supercomputing Applications, the San Diego Supercomputer Center, and the California Institute of Technology.

TeraGrid is scheduled for production in spring of 2003. In this talk, I will describe the TeraGrid project, the planned infrastructure, and the current status of the project, including a focus on the issues we've encountered thus far.

Grid/Fabric interaction (discussion led by Bernd Panzer Steindel, CERN)
In principle the middleware would be a service layer 'above' the Fabric, but it seems that in practice there are quite some interactions imposing possible constraints on the Fabric design itself.storage cache layers, security(network visibility of worker and storage nodes), 'home' directory requirements, support for different Linux versions in the Fabric and on the GRID service, etc

Is this really a problem (compromises..)? who adapts to whom ? experience ? can and should one influence the developments ? should the Fabric require strict rules ? ............

European DataGrid Fabric Management (Olof Barring, CERN)
The fabric management workpackage of the EU DataGrid project has as an objective to provide the tools to automate the management of large computer fabrics. This talk, which provides an update from status and plans I presented at the LCCWS in May 2001, is structured as follows: A short introduction of the EU DataGrid project will be followed by a detailed description of the architecture of the automated fabric management. Thereafter, the status and lessons learned from the first one and a half years of the project three-years duration will be presented. Finally, I will describe our current developments and immediate plans up to March 2003.

Evaluation of a MOSIX cluster as a group analysis facility at CDF (T.Kim, A.Korn, Ch.Paus, M.Neubauer, D.Waters, F.Wuerthwein; FNAL)

Faced with the task of choosing a model for group computing at CDF we built a 30 CPU MOSIX cluster using commodity PC hardware. MOSIX (Multi-computer Operating System for UnIX) is an extension to the Linux kernel that allows for transparent process migration and dynamic load balancing. It was developed at the Hebrew University of Jerusalem School of Computer Science. We have operated such a cluster for 1.5 years and it has been extensively used for data analysis by a group of 10-15 people. We describe our experiences with the current system, its advantages and shortcomings. We comment on performance and scalability.

The CMS Tier 1 computing center at Fermilab (Hans Wenzel, FNAL)
Currently we are building a CMS Tier 1 computing center at Fermilab. Among other activities we participate in distributed Monte Carlo Production, provide resources for interactive and batch computing for user analysis, prepare to be part of the emerging LHC computing grid and prepare to provide computing in the LHC era.

We will describe the current installation at fermilab and describe the technology choices we made:

the dCache system which makes a multi-terabyte server farm look like one coherent and homogeneous storage system and provides rate adaption between the application and tertiary storage (Enstore).
LVS or FBSNG are two approaches providing load balanced logins to use a farm of Linux computers for data analysis.
the FBSNG batch system for farms.
Disk farm makes the data disks attached to individual farm nodes look like one coherent and homogeneous storage system.

FBSNG and Disk Farm - parts of large cluster infrastructure (Igor Mandrichenko, FNAL)
FBSNG is a farm batch system designed and developed at FNAL for Run II data processing on large PC Linux farms. It has been successfully used by Run II and fixed target experiments at FNAL for several years. Disk Farm is a system which organizes disk space distributed over large number of farm nodes into single logical name space. Disk Farm features include UNIX file system - like user interface, data replication, load balancing. The article briefly presents the two systems and summarizes plans and current status of making these two systems Grid-aware.

The GridKa Installation for HEP Computing (Dr. Holger Marten, Forschungszentrum Karlsruhe)
At the beginning of 2002, Forschungszentrum Karlsruhe put the first prototype of the "Grid Computing Centre Karlsruhe" (GridKa) into operation. GridKa is designed to attack compute and data intensive problems in high energy physics and other natural sciences in an international network of Grid Computing Centres. It currently serves as a test facility for LHC Computing, a testbed set-up for the EU-project CrossGrid, and it provides a production environment for BaBar, CDF, D0 and Compass. GridKa will pass its second prototype in October 2002, with approximately 250 Linux Processors, 50 TB online and 100 TB tape capacity. The talk gives an introduction into the current set-up, first experiences and future upgrades of GridKa.