It was a sentimental day for many of the scientists and OLCF support staff who worked on Titan.
"When we had the shutdown day - when it exhaled its last breath and the fans all went down - we had an event in the room", stated Paul Abston, data centre manager at the National Center for Computational Sciences. "There were people in the room who were emotionally tied to the machine. It had been how they had made their living since 2012: maintaining Titan, doing work on Titan, facilitating scientific users from all over the country on Titan."
Although its 27-petaflop performance was once rated as fastest in the world, Titan had to make way for the OLCF's incoming exascale machine, Frontier, which promises to be 50 times faster once it comes online in 2021. Technology marches on.
But what would be Titan's ultimate fate?
Thankfully, it did not include a trip to the landfill. Instead, Titan was carefully removed, trucked across the country to one of the largest IT asset conversion companies in the world, and disassembled for recycling in compliance with the international Responsible Recycling (R2) Standard. This huge undertaking required diligent planning and execution by ORNL, Cray, and Regency Technologies.
What is the first step in removing a 200-cabinet supercomputer that fills a 9000-square-foot data centre? Unplug it.
Second step: figure out how to best evacuate over 10.000 pounds of R134a refrigerant, which was used to cool Titan.
For that particular task, which took 3 days and 50-odd storage cylinders, an outside vendor was hired. Otherwise, the entire job of packing up Titan for its trip to Regency's recycling facility belonged to its manufacturer, Cray. Through its Take-Back Programme, Cray recycles its decommissioned supercomputers. When notified that a Cray system is due to be retired, Cray's on-site engineers work with site-planning engineers in Chippewa Falls, Wisconsin, to develop a project plan for removal. Once removal is scheduled, Cray sends a team to disassemble the system with an eye toward protecting the environment and data security.
"As with any electronics, recycling is the responsible thing to do", stated Craig Webb, a senior manager of logistics at Cray who oversees the Take-Back Programme. "The environmental and economic value of recycling has replaced the 'old school' destruction path that was more prevalent in the early age of big iron. Going way back in Cray's history, we had a customer in India that buried a system in the ground."
Fortunately, no components required burial in the course of uninstalling Titan. Once Titan's coolant was completely emptied, the next step was to cut the supercomputer's many connections to the building itself. The electrical infrastructure was the first to go in order to ensure the system was safe to work on. Then all the mechanical piping was locked out to avoid sudden waterfalls - in addition to R134a, Titan was also kept cool by chilled water piped to heat exchangers. Next, the bundles of underfloor fiber-optic and Cat 6 cabling were severed.
"To be honest, taking it apart was not as intense as you would imagine, when you think about what Titan does - just nuts and bolts and cables and plugs", Paul Abston stated.
Next, the heat exchangers (or "top hats") crowning the cabinets were removed to make it easier to roll the cabinets through doorways. Then began the sweatier work: hefting each complete cabinet to the loading dock where it would be placed on a pallet for shipping. Over the course of 23 days, eight Cray employees hauled 430.000 pounds' worth of Titan components (and related infrastructure) onto 140 pallets, two skids holding 80 gaylord packing crates, and four trim crates. Finally, these sizable fragments of Titan were loaded into 15 semitrucks for a nearly 1000-mile journey.
Titan's next stop was at Regency's facility in Dallas, which Cray uses to handle all of its domestic hardware recycling. Here, the massive system's components were reduced to recyclable bits and pieces.
Based in Stow, Ohio, Regency has eight facilities around the country (totaling a million square feet) and specializes in "IT asset disposition" - which is to say, it does the physical work of breaking down electronic devices and systems into their elemental parts for recycling. According to Jim Anglum, Regency's high-performance computing business developer, the 21-year-old company uses confidential, proprietary processes and tools for this job. Regency is R2:2013 Certified, following industry best practices established by Sustainable Electronics Recycling International, a non-profit organisation "dedicated to the responsible reuse, repair, and recycling of electronic products".
Once Titan's cabinets and their internal components were taken apart by hand, the resulting parts were sorted by commodities: metals, plastic, printed circuit assemblies, memory, etc. Where each material went next was based on its composition - steel, aluminum, copper, or sheet metal went to a downstream metal processor. Plastic parts were recycled. Complex parts - such as printed circuit boards, CPUs, and GPUs - were shredded by special machines to isolate their precious metals; the gold and platinum were sold to refiners for a small profit.
"The only component sold for reuse is the memory", Craig Webb stated. "The AMD CPUs have no resale market value and the NVIDIA GPUs' custom packaging precludes easy resale - plus, the labour to tear down the assembly to isolate the GPU is not cost effective. However, we did send some of the GPUs to the last big XE6 system with GPUs - it's nice to have the buffer spares just in case."
Although Cray and Regency have accrued quite a bit of experience in recycling old supercomputers, Titan stands out as their biggest effort yet.
"Titan is an anomaly. I do not believe we have previously recycled a system as large as Titan", Craig Webb stated. "That said, we have done large systems before and it is really just a scaling exercise. Processing systems of this size is no different than processing servers or PCs, just much larger and time consuming."
But why dismantle a supercomputer like Titan in the first place? It was still rated the 12th-most-powerful computer in the world as it was being decommissioned. So why not move it to another institution that may find further use for it?
The simple answer is that the price tag for providing all the infrastructure Titan required to function would have been cost prohibitive. Unlike newer supercomputers, Titan needed three different cooling systems to operate: refrigerant, chilled water, and air conditioning - all very expensive to maintain. Furthermore, Titan used about 4 to 6 megawatts of electricity on average, which is enough to power over 3000 houses - not the sort of electrical service available to many institutions. Meanwhile, attempting to reduce Titan's overall size and power usage with fewer cabinets would have resulted in less computing power than can be purchased with newer, smaller systems at a lower cost.
"Breaking it apart into pieces is worthless because it was only good for its speed as a whole", Paul Abston stated. "So, if I say, 'Well, I just want five cabinets of it', I could probably buy new technology and get away with one cabinet. It came to a point where Titan really had no purpose outside of here, unless somebody wanted to take it as a whole."
National labs like ORNL are among the few customers able to host such large supercomputer systems. Right now, in fact, Titan's former space at the OLCF is undergoing a complete make-over to prepare it for the construction of Frontier. The current ceiling will be permanently removed for a new electrical system to be installed overhead. The floor will be pulled up to install piping for a new cooling system; then a new floor will go in with a higher weight rating to support Frontier. The time frame for the revamp's completion is spring of 2021.
"It's a pretty monumental task", Paul Abston stated. "Fortunately, we are a can-do laboratory - we can get a machine installed safely and on time and we can get a machine removed safely and on time. And then we can get a room ready for the next machine safely and on time. Hopefully, it's an endless cycle. It's why we're here - to provide these resources to the citizens who paid for them."