Today, the world's top supercomputers are massively parallel computers with processors connected to up to tens of thousands of nodes. In massively parallel computers, if there is a failure in any of the compute notes, the failed nodes are isolated and the operation of the system is maintained, but the isolation method affects overall system performance and availability. In conventional node structures, pre-partitioned meshes are connected by switches, and when there is a failure, each partition that includes a failed node will be isolated. However, as a result, nodes that have not failed are also included in the isolated partitions, leading to a decline in system availability.
To overcome this problem, Fujitsu invented high-dimensional interconnect technology that does not employ partitioning switches. This technology enables failed nodes to be circumvented, no decline in the level of parallelism that can be executed, and a high level of system availability to be maintained. This technology is being used in the K computer, which interconnects 88,128 nodes, and which was named the world's top-performing supercomputer in the 37th and 38th editions of the TOP500 List of the world's top supercomputers in 2011. In addition to the K computer, this technology is employed in the PRIMEHPC FX10, which is deployed in academic institutions and corporations around the world and is also being used in the successor model of the PRIMEHPC FX10, which is currently under development.
The high-dimensional interconnect technology is comprised of a predetermined number of nodes that are grouped together on a grid, with the groups linked together using a torus connection configuration. A torus is a structure in which multiple groups are connected in a ring configuration, with the rings connecting different groups combined in a grid. With this invention, the partitioning of the group can be positioned where one chooses. Since the partitioned units are small, a variety of parallel programmes can be simultaneously executed with excellent efficiency.
In this configuration, partitioned switches are unnecessary, while the increase in dimensionality has the effect of increasing the number of connection ports, which are all communication paths that contribute to computational performance. In addition, failures can be isolated not only at the level of individual partitions, but within the partitions as well. In the event of isolation within a partition, a virtual loop connection is used for the seamlessly uninterrupted pathway between the rings connecting the groups and the grid structure within individual groups.