Back to Table of contents

Primeur weekly 2016-03-07

Focus

Azerbaijan e-infrastructure is developing fast thanks to project partnership with European Community ...

Quantum computing

Quantum leap for new UK centre ...

British minister announces GBP 204 million investment in doctoral training and Quantum Technologies science ...

Surrey's GBP 3 million grant puts the UK in pole position in the race to quantum technologies ...

Focus on Europe

Highly realistic virtual neurons fruit of an Allen Institute and Blue Brain collaboration ...

Middleware

Bright Computing announces partnership with France-based Alineos ...

Adaptive Computing launches high productivity remote visualization ...

Hardware

Mellanox delivers next generation network processor to key telco customers ...

TE Connectivity signs enterprise license agreement with ANSYS ...

OCF delivers new 181 Teraflop/s HPC resource for University College London ...

Cray appoints Fred Kohout as Chief Marketing Officer ...

The University of Rijeka takes a leap and unveils the most powerful supercomputer in the Adriatic region, designed by Bull ...

DDN continues run as global leader in 2015 TOP500 list of global supercomputing centres ...

Applications

SKODA AUTO drives innovation and automotive advancements powered by new SGI supercomputer ...

IBM and New York Genome Center to create comprehensive, open cancer data repository to tap cognitive insights from Watson ...

Updated workflows for new LHC ...

Cornell opens $25 million NSF platform for discovering new materials ...

World-leading supercomputer instrumental in discovery of proof point for Albert Einstein's general theory of relativity by Cardiff University ...

World's first parallel computer based on biomolecular motors ...

The Cloud

American Sleep Apnea Association and IBM launch SleepHealth app ...

Oracle unveils first data protection solution with seamless Cloud tiering ...

Updated workflows for new LHC


A view inside the liquid-argon calorimeter endcap - Photo Credit: The ATLAS Experiment at CERN.
22 Feb 2016 Berkeley - After a massive upgrade, the Large Hadron Collider (LHC), the world's most powerful particle collider is now smashing particles at an unprecedented 13 tera-electron-volts (TeV) - nearly double the energy of its previous run from 2010-2012. In just one second, the LHC can now produce up to 1 billion collisions and generate up to 10 gigabytes of data in its quest to push the boundaries of known physics. And over the next decade, the LHC will be further upgraded to generate about 10 times more collisions and data.

To deal with the new data deluge, researchers working on one of the LHC's largest experiments - ATLAS - are relying on updated workflow management tools developed primarily by a group of researchers at the Lawrence Berkeley National Laboratory. Papers highlighting these tools were recently published in theJournal of Physics: Conference Series.

"The issue with High Luminosity LHC is that we are producing ever-increasing amounts of data, faster than Moore's Law and cannot actually see how we can do all of the computing that we need to do with the current software that we have", stated Paolo Calafiura, a scientist in Berkeley Lab's Computational Research Division (CRD). "If we don't either find new hardware to run our software or new technologies to make our software run faster in ways we can't anticipate, the only choice that we have left is to be more selective in the collision events that we record. But, this decision will of course impact the science and nobody wants to do that."

To tackle this problem, Paolo Calafiura and his colleagues of the Berkeley Lab ATLAS Software group are developing new software tools called Yoda and AthenaMP to speed up the analysis of the data by leveraging the capabilities of next-generation Department of Energy (DOE) supercomputers like the National Energy Research Scientific Computing Center's (NERSC's) Cori system, as well as DOE's current Leadership Computing Facilities, to analyze ATLAS data.

Around the world, researchers rely on the LHC Computing Grid to process the petabytes of data collected by LHC detectors every year. The Grid comprises 170 networked computing centres in 36 countries. CERN's computing centre, where the LHC is located, is 'Tier 0' of the Grid. It processes the raw LHC data, and then divides it into chunks for the other Tiers. Twelve 'Tier 1' computing centres then accept the data directly from CERN's computers, further process the information and then break it down into even more chunks for the hundreds of computing centers further down the grid. Once a computer finishes its analysis, it sends the findings to a centralized computer and accepts a new chunk of data.

Like air traffic controllers, special software manages workflow on the computing Grid for each of the LHC experiments. The software is responsible for breaking down the data, directing the data to its destination, telling systems on the grid when to execute an analysis and when to store information. To deal with the added deluge of data from the LHC's upgraded ATLAS experiment, Vakhtang Tsulaia from the Berkeley Lab's ATLAS Software group added another layer of software to the grid called Yoda Event Service system.

The researchers note that the idea with Yoda is to replicate the LHC Computing Grid workflow on a supercomputer. So as soon as a job arrives at the supercomputer, Yoda will breakdown the data chunk into even smaller units, representing individual events or event ranges, and then assign those jobs to different compute nodes. Because only the portion of the job that will be processed is sent to the compute node, computing resources no longer need to stage the entire file before executing a job, so processing happens relatively quickly.

To efficiently take advantage of available HPC resources, Yoda is also flexible enough to adapt to a variety of scheduling options - from back filling to large time allocations. After processing the individual events or event ranges, Yoda saves the output to the supercomputer's shared file system so that these jobs can be terminated at anytime with minimal data losses. This means that Yoda jobs can now be submitted to the HPC batch queue in back filling mode. So if the supercomputer is not utilizing all of its cores for a certain amount of time, Yoda can automatically detect that and submit a properly sized job to the batch queue to utilize those resources.

"Yoda acts like a daemon that is constantly submitting jobs to take advantage of available resources, this is what we call opportunistic computing", stated Paolo Calafiura.

In early 2015 the team tested Yoda's performance by running ATLAS jobs from the previous LHC run on NERSC's Edison supercomputer and successfully scaled up to 50,000 computer processor cores.

In addition to Yoda, the Berkeley Lab ATLAS software group also developed the AthenaMP software that allows the ATLAS reconstruction, simulation and data analysis framework to run efficiently on massively parallel systems.

"Memory has always been a scare resource for ATLAS reconstruction jobs. In order to optimally exploit all available CPU-cores on a given compute node, we needed to have a mechanism that would allow the sharing of memory pages between processes or threads", stated Paolo Calafiura.

AthenaMP addresses the memory problem by leveraging the Linux fork and copy-on-write mechanisms. So when a node receives a task to process, the job is initialized on one core and sub-processes are forked to other cores, which then process all of the events assigned to the initial task. This strategy allows for the sharing of memory pages between event processors running on the same compute node.

By running ATLAS reconstruction in one AthenaMP job with several worker processes, the team notes that they achieved a significantly reduced overall memory footprint when compared to running the same number of independent serial jobs. And, for certain configurations of the ATLAS production jobs they've managed to reduce the memory usage by a factor of two.

"Our goal is to get onto more hardware and these tools help us do that. The massive scale of many high performance systems means that even a small fraction of computing power can yield large returns in processing throughput for high energy physics", stated Paolo Calafiura.

This work was supported by DOE’s Office of Science.

More information is available in the papers Fine grained event processing on HPCs with the ATLAS Yoda system and Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP) .
Source: Lawrence Berkeley National Laboratory

Back to Table of contents

Primeur weekly 2016-03-07

Focus

Azerbaijan e-infrastructure is developing fast thanks to project partnership with European Community ...

Quantum computing

Quantum leap for new UK centre ...

British minister announces GBP 204 million investment in doctoral training and Quantum Technologies science ...

Surrey's GBP 3 million grant puts the UK in pole position in the race to quantum technologies ...

Focus on Europe

Highly realistic virtual neurons fruit of an Allen Institute and Blue Brain collaboration ...

Middleware

Bright Computing announces partnership with France-based Alineos ...

Adaptive Computing launches high productivity remote visualization ...

Hardware

Mellanox delivers next generation network processor to key telco customers ...

TE Connectivity signs enterprise license agreement with ANSYS ...

OCF delivers new 181 Teraflop/s HPC resource for University College London ...

Cray appoints Fred Kohout as Chief Marketing Officer ...

The University of Rijeka takes a leap and unveils the most powerful supercomputer in the Adriatic region, designed by Bull ...

DDN continues run as global leader in 2015 TOP500 list of global supercomputing centres ...

Applications

SKODA AUTO drives innovation and automotive advancements powered by new SGI supercomputer ...

IBM and New York Genome Center to create comprehensive, open cancer data repository to tap cognitive insights from Watson ...

Updated workflows for new LHC ...

Cornell opens $25 million NSF platform for discovering new materials ...

World-leading supercomputer instrumental in discovery of proof point for Albert Einstein's general theory of relativity by Cardiff University ...

World's first parallel computer based on biomolecular motors ...

The Cloud

American Sleep Apnea Association and IBM launch SleepHealth app ...

Oracle unveils first data protection solution with seamless Cloud tiering ...