Machine learning, artificial intelligence, data science: "Big Data" takes on numerous branches, though each is rooted in the same goal of using large, complex data sets in applied problems in everything from government, to health care, engineering and manufacturing. Solving Big Data questions, which are broken down into mathematical or statistical models, requires computer architecture and software that is thousands of times more powerful than the average laptop.
Fortunately, Clemson University has one of the top four supercomputers at a public institution: the Palmetto Cluster, which is housed at the Advanced Materials Research Laboratory in Pendleton. The Palmetto Cluster makes use of more than 23,000 central processing unit (CPU) cores, the "brain" circuitry of a computer that can complete trillions of mathematical operations per second. Operated on a democratized condominium model system, any Clemson faculty, staff or student can register for an account and use the Palmetto Cluster free - a perk that puts Clemson students at an advantage.
"To give you an idea of where computation comes into the math sciences these days, I have a student who worked on a spatio-temporal modelling problem that takes a very powerful desktop about a week and half to solve the problem", stated Christopher McMahan, an associate professor in the School of Mathematical and Statistical Sciences. "I have graduate students who, without the Palmetto Cluster, would not be able to actually write their dissertations just due to the sheer computational nature."
With the addition of the DGX-2 server, the computational abilities of Clemson University will be boosted with the most advanced computing power available, combining CPU technology with that of graphics processing unit (GPU) circuitry.
Marketed by NVIDIA, the server works under the premise of parallel computing by splitting one large job into thousands of smaller ones, each of which is taken on by a different GPU to complete the job simultaneously. On its own, a single GPU will never compare to the computing ability of a single CPU, yet the power of the DGX-2 comes from its sheer numbers, Christopher McMahan stated. He likens the process to the flow of water through a colander.
"You're pouring water out of a barrel and a CPU would be a pipe. That water is flowing down into this pipe, and there's so much of it that it gets bottlenecked. It can only go through the pipe so fast, whereas a GPU would be like a colander. Every hole that the water is passing through is smaller, but there are more of them, so more water can pass through it", Christopher McMahan stated. "That's literally the difference between a CPU and GPU processor. It's the 1-to-1000s where the GPUs win."
The DGX-2 is outfitted with 24 CPU cores and 16 state-of-the-art GPU chips, which are designed specifically for rapid, high-performance computing. The server can achieve processing speeds of 2 petaFLOPs, or 2,000 trillion floating point operations per second. In other words, the DGX-2 is fast, able to complete trillions upon trillions of calculations every second as the world's newest and best server in the scientific supercomputing market.
While the Clemson faculty members on the grant to purchase the DGX-2 represent many research fields - computational math, statistics, operations research, and mechanical and industrial engineering, for example - they maintain that the server can be used across a variety of disciplines so long as the question being analyzed is computational and able to be broken apart for a supercomputing platform.
Because of this, Yuyuan "Lance" Ouyang, an assistant professor in the School of Mathematical and Statistical Sciences and the principal investigator on the grant proposal, said the entire university will benefit from having the DGX-2, especially graduate students who are entering an increasingly tech-oriented, data-streaming job market.
"When our students go into the workforce, employers would like them to have some experience with hands-on, state-of-the-art deep-learning applications, and that requires equipment. Now, there is this top-brand computational resource that we actually have in our university. Students will be able to get their experience on the DGX-2, and when they go out into the workforce, they can be really confident not only in the theory of machine learning, but also the coding and the running of applications on a state-of-the-art deep-learning platform", Yuyuan "Lance" Ouyang stated. "This experience is great for them."
"You'll have an entire generation of students coming out of our programs that know how to use the cutting-edge computational platforms, and most people in their graduate programs do not get that outside of a computer science department, so it does set us apart in that context", Christopher McMahan added.
The DGX-2 is slated to be delivered in the coming months after which the grant proposal team will be the first to be trained on the platform. After an embargo period, the university's condominium model will open up the DGX-2 to all Clemson faculty and students conducting research in Big Data.
The proposal team - Yuyuan "Lance" Ouyang, Christopher McMahan, Qingshan Chen - Mathematical and Statistical Sciences, Yiqiang Han - mechanical engineering, Cole Smith - industrial engineering, and Boshi Yang - Mathematical and Statistical Sciences, acknowledges the Statistics and Mathematics Consulting Center, the Clemson Operations Research Institute and Clemson Computing and Information Technology (CCIT) for working together on the DURIP grant to bring the DGX-2 to campus.