IT today is all about interconnectivity, which comes from a long history of combining computing resources in processing, networking, or storage. High-performance computing (HPC) takes that concept and enhances it, aggregating computing power to achieve maximum output.
Solving the world’s big computational problems requires enormous effort, but HPC connects elements together to form parallel processing grids to solve problems faster. With the ability to perform trillions of calculations per second, the cloud computing and AI era is the perfect environment for HPC to thrive, delivering results that are impractical or impossible with traditional computing.
“Typically, HPC is huge computers in big data centers, but sometimes cheaper versions are assembled from repurposed gaming systems or networks of distributed systems using consumer devices,” said Eleanor Watson, IEEE member, AI ethical engineer and AI faculty member at Singularity University.
It works by applying special algorithms to the problem, dynamically splitting the dataset or processing task into smaller pieces and sending them across an HPC network, where processors can work on individual parts of the overall problem – this is parallel computing, where components are processed simultaneously and then brought together at the end to provide the complete answer.
But it’s more than just processor clusters: HPC also requires the best networking technology possible, so data transfer is as fast and efficient as processing. The sheer size of the data sets to be partitioned and processed exponentially increases memory and storage needs across the architecture.
What are the use cases for HPC?
Imagine trying to track a human population with a notepad and pen: you wouldn’t be able to note changes the moment they happen, much less calculate the results and metrics.
The only way to calculate these large, rapidly changing measurements is to create digital models, but some digital models are so large that they are too complex for even standalone computers or servers. That’s where HPC comes in.
“HPC enables scientists and researchers to tackle computationally intensive problems that are difficult or impossible to solve on traditional desktop computers,” said Eleanor Watson.
Advanced problems often require simulation and analysis to find and isolate influencing factors and measure their impact. Assumptions and guesses are not enough, and even advanced workstation calculations cannot match the level of detail and speed that can be achieved with HPC. Leveraging HPC at the enterprise level is delivering better results, greater efficiency, and higher profits in every sector, including aerospace, automotive, pharmaceuticals, and urban planning.
As one example, HPC played a vital role when the medical community was racing to find a vaccine for COVID-19: supercomputers were used to predict the spread of the infection, model the structure of the virus, and develop the most effective vaccine.
Oak Ridge National Laboratory’s Summit supercomputer played a key role in screening billions of molecular models of the virus to identify which therapeutic compounds might be effective.
But COVID modeling is not the first example of HPC in medicine: Storing genomic profiles from thousands of people can be compared and used to identify genetic markers of disease, helping scientists design personalized treatments.
Google DeepMind has been using HPC since 2018 for a solution called AlphaFold to predict and build protein structures. Understanding how proteins fold provides more information than ever before for designing drugs that the body responds to.
Accurately modeling climate change is not only one of our most complex problems, but also one of the most urgent. The European Centre for Medium-Range Weather Forecasts uses HPC to build simulations that enable more accurate forecasts, providing scientists with vital data on long-term climate change and the best possible warning of severe weather events.
Beyond predictive modeling, HPC can combat climate change by helping us design better wind turbines and solar panels, the optimal configurations for building and deploying them, and more efficient energy storage and transportation systems.
Building blocks of HPC
Essentially, an HPC performs the same functions as a desktop PC, but on a much larger scale and the components are not necessarily physically connected.
The first part is that processors on the microchip take parts of the problem and perform calculations in parallel, and that work is communicated between the processors and reassembled by the architecture into a cohesive result.
The second is storage, where you need to place not only your application code but also the incoming data. Because a single magnetic tape or solid-state disk (SSD) technology is not large enough to handle HPC, you need multiple storage media (such as disks) connected by a network.
This means that the networking technology within each disk – the mechanism that stores and retrieves bits – and the technology that connects them to create large virtual systems must be fast enough to keep up with the data volumes that HPC demands.
And that means bandwidth and transport protocols robust enough to break up and collect packets of information fast enough for processors to do their work. Large datasets are difficult to process, but in instances where constant changes need to be accounted for in real time, the network must perform at the highest level not just to ingest the initial dataset but to receive constant updates.
Applications that can model the COVID-19 genome or monitor solar weather activity are computationally intensive in both resources and data footprint, and together with the data they use, would be nearly impossible to fit on a single disk or server.
And that brings us to the final piece of the puzzle: purpose-built systems that orchestrate it all. There is a field of specialized applications that manage HPC deployments, direct data to where it needs to be, and allocate resources in the most efficient way, taking into account power needs, network availability, storage space, and input/output architectures.
Advances in networking mean that all of the above can now be fully distributed – a great example of this is UK Research and Innovation’s AI Research Resources (AIRR), a £300 million programme to develop HPC for AI research.
“HPC systems are often organized into clusters, with many individual computer nodes networked together,” Watson says. “One example is the AIRR, which includes supercomputers at the universities of Cambridge and Bristol. Cambridge’s ‘Dawn’ supercomputer will have more than 1,000 high-performance Intel GPUs, while Bristol’s ‘Isambard-AI’ system will have more than 5,000 GPUs.”
Largest HPC Provider
According to Fortune Business Insights, the global market for high-performance computing is expected to be worth just over $50 billion in 2023 and exceed $100 billion by 2032.
The most active providers are, unsurprisingly, some of the biggest names in the technology industry: Hewlett Packard Enterprise, with a background in Software as a Service (SaaS) and cloud computing, is one of the top providers of supercomputing products such as the HPE Cray XD2000 and HPE Cray EX2500.
Dell Technologies is also increasing its focus on HPC, including supporting the University of Cambridge’s Dawn supercomputer, and is working to further expand its platform over the next few years.
Nvidia, on the other hand, is known solely for its hardware, and is building its next-generation chipsets that it hopes will be the foundation for an AI revolution. Combined with its home-grown software architecture, the company’s new GPUs will further accelerate data processing across HPC deployments.
While Microsoft is best known for its software, its Azure cloud architecture integrates hardware and computing to provide scalable HPC products, and in recent years has developed supercomputers for training AI models, such as the one OpenAI built to train GPT-4.
New Frontiers
While HPC is becoming a more established paradigm, data and hardware scientists are already starting to build its next evolution: exascale computing.
Exascale computing, which refers to a system capable of performing at least one exaflop (floating-point operation), or 10 trillion calculations per second, will enable the modeling and analysis of systems that are challenging even for the current generation of HPC.
It’s also a stepping stone to the next generation of HPC, and Antonio Córcoles, principal research scientist at IBM Quantum, explains how this connects to the world of quantum computing.
“The next generation of high performance computing will bring exascale computing using CPUs and GPUs, as well as the integration of quantum processing units. [QPU] Together they solve different parts of complex problems that are best suited to their respective strengths and capabilities. This will enable new, large-scale, powerful computational capabilities, and our vision is for the QPU to become another element, a critical part, of any HPC system, including exascale computing.