Back to Table of contents

Primeur weekly 2018-08-20

Focus

NVIDIA's Tensor Core GPU offers best of both worlds for HPC and AI with multiple precision ...

Quantum computing

Another step forward on universal quantum computer ...

Quantum material is promising 'ion conductor' for research and new technologies ...

Focus on Europe

Professor Arndt Bode receives highest award for public service in Germany ...

Baycrest co-created Virtual Brain joins flagship neuroscience initiative in Europe ...

Middleware

ShareBackup could keep data in the fast lane ...

Hardware

Asetek to issue Q2 2018 with a record quarterly revenue ...

HPE triples performance and enhances energy efficiency in new supercomputer for National Renewable Energy Laboratory (NREL) ...

NVIDIA reinvents computer graphics with Turing architecture ...

NVIDIA unveils Quadro RTX, world's first ray-tracing GPU ...

CoolIT Systems announces STULZ ANZ as master distributor for liquid cooling products in Australasian region ...

Tachyum touts benefits of universal processor for hyperscale data centres, HPC and AI markets at Hot Chips 2018 ...

"Mission: Green Computing" by Supermicro introduces total cost to the environment (TCE) for leading data centres ...

Rochester Institute of Technology awarded NSF grant to advance high-tech computer architectures ...

Applications

OmniTier's CompStor brings de novo analytics to genomics ...

Deep Learning stretches up to scientific supercomputers ...

More efficient security for Cloud-based machine learning ...

Trailblazer in computational complexity theory to receive Knuth Prize ...

Fellowships recognize tomorrow's supercomputing innovators ...

Magnetic antiparticles offer new horizons for information technologies ...

Low bandwidth? Use more colors at once ...

The Cloud

Tracking down the Big Bang: CERN and Oracle extend research and development partnership ...

Premier Indian Government think tank partners with Perlin Network to advance its national distributed computing infrastructure ...

ShareBackup could keep data in the fast lane


Rice University computer scientist Eugene Ng led the development of ShareBackup, a hardware and software solution to help data centres recover from failures without slowing applications. Credit: Jeff Fitlow/Rice University.
16 Aug 2018 Houston - Anyone who has ever cursed a computer network as it slowed to a crawl will appreciate the remedy offered by scientists at Rice University. Rice engineers have developed ShareBackup, a hardware and software solution to help data centres recover from failures without slowing applications.

Rice computer scientist Eugene Ng and his team said their solution will keep data on the fast track when failures inevitably happen.

Eugene Ng introduced ShareBackup, a strategy that would allow shared backup switches in data centres to take on network traffic within a fraction of a second after a software or hardware switch failure.

He will present a peer-reviewed paper on the work at the SIGCOMM 2018 conference in Budapest, Hungary. The paper is online and available for download .

Eugene Ng said the idea would solve a common annoyance among data professionals, scientists and everyone who relies on a network to deliver results day in and day out.

"A data network consists of servers and network switches", stated Eugene Ng, a professor of computer science and electrical and computer engineering. "Switches move data packets to where they need to go. But things fail, especially in large-scale data centers with thousands of pieces of hardware."

The usual response to a failed switch is to shunt the flow of data to another line. "Generally, the network has multiple paths for connecting servers so, just like if there's a closure on the highway, we'd drive around it. This is a conventional, natural approach that makes a lot of sense: You reroute around the failure to get where you need to go."

But sometimes that other road is congested and everything slows down. "Data centres aren't the internet; they're not about people surfing websites", Eugene Ng stated. "They're about supporting data-intensive applications like data mining or machine learning. And a lot of these applications have stringent performance deadlines, so blindly rerouting traffic could be the wrong thing to do in a data centre."

Rather than the expensive option of installing redundant switches throughout a network, the Eugene Ng lab's strategy would put fast switches and software in strategic locations that could pick up the traffic from a failed switch in a microsecond. When that problem is resolved, the team's software makes the backup switch available to handle another failure.

The switch is fast enough - the failure-recovery time is 0.73 milliseconds, including latency from hardware and control systems - that most users would never know that part of the system had failed.

"The reality is that the fraction of devices that fail at any given time is very small, and most of these failures can be addressed by things like rebooting the device", Eugene Ng stated. "Sometimes the software gets screwed up and a simple power cycle will bring it back. These failures may also not last long. These are the characteristics we're trying to exploit", he stated. "Because of that, we can get away with having very few devices back up a large number of devices."

Eugene Ng said ShareBackup could save data centers time and money not only by maintaining full bandwidth but by also helping to analyze problems, including misconfigurations that commonly lead to network failure.

"Part of our work is to help data centers figure out what went wrong in the network", he stated. "Once the backup is activated, you can take the failed device out of the production network and test it to identify which component caused the problem."

"Now, if we take two devices out and can't figure out which went bad, both need to be replaced", he stated. "It's very likely only one of the devices is having the problem. Our software can diagnose these devices in a semiautomatic manner, and if one of the parts is good, it can be reinstated."

Lead authors of the paper are Rice graduate student Dingming Wu and alumnus Yiting Xia, now a computer scientist at Facebook. Co-authors are Rice graduate students Xiaoye Steven Sun, Xin Sunny Huang and Simbarashe Dzinamarira.

The National Science Foundation supported the research.

Source: Rice University

Back to Table of contents

Primeur weekly 2018-08-20

Focus

NVIDIA's Tensor Core GPU offers best of both worlds for HPC and AI with multiple precision ...

Quantum computing

Another step forward on universal quantum computer ...

Quantum material is promising 'ion conductor' for research and new technologies ...

Focus on Europe

Professor Arndt Bode receives highest award for public service in Germany ...

Baycrest co-created Virtual Brain joins flagship neuroscience initiative in Europe ...

Middleware

ShareBackup could keep data in the fast lane ...

Hardware

Asetek to issue Q2 2018 with a record quarterly revenue ...

HPE triples performance and enhances energy efficiency in new supercomputer for National Renewable Energy Laboratory (NREL) ...

NVIDIA reinvents computer graphics with Turing architecture ...

NVIDIA unveils Quadro RTX, world's first ray-tracing GPU ...

CoolIT Systems announces STULZ ANZ as master distributor for liquid cooling products in Australasian region ...

Tachyum touts benefits of universal processor for hyperscale data centres, HPC and AI markets at Hot Chips 2018 ...

"Mission: Green Computing" by Supermicro introduces total cost to the environment (TCE) for leading data centres ...

Rochester Institute of Technology awarded NSF grant to advance high-tech computer architectures ...

Applications

OmniTier's CompStor brings de novo analytics to genomics ...

Deep Learning stretches up to scientific supercomputers ...

More efficient security for Cloud-based machine learning ...

Trailblazer in computational complexity theory to receive Knuth Prize ...

Fellowships recognize tomorrow's supercomputing innovators ...

Magnetic antiparticles offer new horizons for information technologies ...

Low bandwidth? Use more colors at once ...

The Cloud

Tracking down the Big Bang: CERN and Oracle extend research and development partnership ...

Premier Indian Government think tank partners with Perlin Network to advance its national distributed computing infrastructure ...