The North Star, also known as the Pole Star, is located almost exactly at the north celestial pole. Seemingly a fixed point in the sky, the North Star often symbolizes direction, guidance, stability, and purpose. With the achievement of recent Exascale computing goals, the question arises: What is the new North Star for high-performance computing? This was discussed at this year’s invite-only Salishan Conference on High-Speed Computing, where global leaders in HPC gathered as they have for the last 40+ years.
A typical starting point in defining an HPC goal is performance measurement. One measure of computing performance is a machine’s ability to execute large numbers of floating-point operations per second (FLOPS). FLOPS can be performed at different measures of precision, but the standard measure adopted by the supercomputing community is based on 64-bit (double-precision floating-point format) operations per second using the High Performance LINPACK (HPLinpack) benchmark.
We are now in the exaflop age with systems capable of calculating at least 10¹⁸ FLOPS. In 2020, the Japanese supercomputer Fugaku became the first single exascale system, reaching 1.42 exaFLOPS using the HPL-AI benchmark. Several countries, including the United States, United Kingdom, European Union, Japan, China, Taiwan, and India, have either introduced or announced plans to introduce exascale computers.
Now that the exascale computing barrier has been broken, the new North Star for supercomputing is the creation of zettascale machines capable of executing at least 10²¹ FLOPS. Engineers and scientists are currently anticipating that the first zettascale computers will come online circa 2035, but they are still in the early days of planning for these machines.
What is known is that next–generation zettascale computers cannot simply be “more of the same.” Existing exascale computers consume approximately 20 megawatts of power, so at 1,000 times the performance, a zettascale machine implemented using today’s technology would require its own nuclear power station. This was echoed by Lisa Su, AMD’s CEO at the ISSCC trade show, where she noted that if a zettascale computer were assembled using today’s supercomputing technologies, it would consume about 21 gigawatts, or equivalent to the energy produced by 21 nuclear power plants.
This means that achieving the North Star of zettascale computing is going to require new computing devices, architectures, and technologies. As a starting point, China’s National University of Defense Technology has proposed the following metrics for a zettascale computer:
- Total power consumption: 100 MW
- Power efficiency: 10 teraflops/watt
- Peak performance per node: 10 petaflops
- Communication bandwidth between nodes: 1.6 terabits/second
- I/O bandwidth: 10 to 100 petabytes/second
- Storage capacity: 1.0 zettabyte
- Floor space: 1,000 square meters
There were other general points of agreement among the experts in attendance at Salishan regarding reaching zettascale performance. For one, it will not be economically feasible to create zettascale machines that can address only a single class of problems. Instead, these new machines will need to be heterogeneous in nature and support a wide range of scientific and AI/ML applications. Furthermore, it was also widely agreed that interconnect will be a focal point of future HPC system architectures and that optical interconnect is necessary to support the extreme data bandwidth requirements of these next-generation systems.
John Shalf, Department Head for Computer Science Lawrence Berkeley National Laboratory and Salishan conference attendee noted that “much of this is driven by concerns that off chip bandwidth will create an upper limit to practical computing performance unless the bandwidth bottleneck can be addressed. Both co-packaging and integrated photonics can address these bandwidth bottlenecks, but photonics solutions enable more options for supplying bandwidth that can go longer distances. Breaking out of the package provides system architects a lot more integration options.”
Earl Joseph, CEO of Hyperion Research, gave a presentation that highlighted a recent study by his firm regarding the market readiness and expectations for optical I/O connectivity from HPC/AI users and vendors. The primary research outcomes are depicted in the infographic below.
Infographic illustrating data from Hyperion’s research white paper, “Strong Market Sentiment for Optical I/O Connectivity”
The key finding of the study is optical I/O was identified by both users and vendors as being the technology that will have the highest potential of positively changing HPC architectures in the next 2-6 years (in-memory computing was identified as being the second highest impact technology area). Also ranking as a high–impact technology are physical interface standards (e.g., CXL and UCIe) that can facilitate standardized connections between the host SoC and in-package optical I/O chiplets.
Not surprisingly, according to the report, predominant system issues for future architectures to address include system scale-out, lack of system composability, power consumption, and network throughput.
What was more unexpected was the resounding demand for disaggregation and resource composability in future architectures. Seventy-five percent of respondents (both HPC users and vendors) agree that there is a strong need for disaggregation of system resources to enable workload-driven composable infrastructure.
As the HPC community defines its new North Star, new technologies and architectures are needed to meet the demands for continued scale-out, network throughput, lower power, and composability. Optical I/O holds the promise of delivering these demands to transform the HPC architectures of the future.
To delve more into the world of disaggregated resources and cutting-edge optical interconnect technologies, join us for a panel discussion on June 8 with industry leaders GlobalFoundries, Hyperion Research, Microsoft, NVIDIA, and Quantifi Photonics, to discover how each company is driving the push towards next-generation HPC and AI infrastructure. Register here.