Meta, a social media conglomerate, is the latest tech firm to develop a “AI supercomputer” – a high-speed computer dedicated to training machine learning systems. The business claims that its new AI Research SuperCluster, or RSC, is already among the fastest computers of its kind and will be the fastest in the world when completed in mid-2022.
In a statement, Meta CEO Mark Zuckerberg claimed, “We think we have constructed the world’s fastest AI supercomputer.” “We’re calling it RSC, which stands for Artificial Intelligence Research SuperCluster, and it’ll be finished later this year.”
The announcement emphasizes how important AI research is to firms like Meta. Rivals such as Microsoft and Nvidia have already launched their own “AI supercomputers,” which differ somewhat from traditional supercomputers. RSC will be used to train a variety of systems across Meta’s companies, ranging from hate speech detection algorithms on Facebook and Instagram to augmented reality capabilities that will be accessible in the company’s future AR gear.
Yes, Meta claims RSC will be used to create experiences for the metaverse, the company’s relentless branding for a web of virtual locations ranging from offices to online arenas.
In a blog post announcing the news, Meta engineers Kevin Lee and Shubho Sengupta write, “RSC will help Meta’s AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more.”
“We expect that RSC will assist us in developing whole new AI systems that can, for example, enable real-time speech translations for huge groups of people speaking different languages, allowing them to collaborate on a research project or play an AR game together.”
RSC was started a year and a half ago, with Meta’s engineers developing all of the machine’s systems from the ground up, including cooling, power, networking, and cabling. RSC’s first phase is now operational, consisting of 760 Nvidia GGX A100 workstations with 6,080 linked GPUs (a type of CPU that excels at solving machine learning issues). Meta claims that it has already increased performance on its regular machine vision research projects by up to 20 times.
However, phase two of RSC will be completed by the end of 2022. It will then have 16,000 total GPUs and be capable of training AI systems “with more than a trillion parameters on data sets as large as an exabyte” at that time. (While the sheer number of GPUs is a limited indicator for a system’s total performance, Microsoft’s AI supercomputer, constructed in collaboration with research lab OpenAI, is made up of 10,000 GPUs.)
These figures are amazing, but they beg the question: what exactly is an AI supercomputer? And how does it compare to supercomputers, which are massive machines used by universities and governments to crunch figures in fields as diverse as space, nuclear physics, and climate change?
The two types of systems, referred to as high-performance computers or HPCs, are far more similar than they are dissimilar. Both are larger and more akin to datacenters than individual computers, and rely on a vast number of networked processors to transmit data at breakneck rates. However, as Hyperion Research’s HPC expert Bob Sorensen explains to The Verge, there are significant distinctions between the two. “AI-based HPCs exist in a little different environment than their traditional HPC counterparts,” Sorensen explains, and the key difference is accuracy.
The short reason is that machine learning jobs demand less precision than typical supercomputer work, hence “AI supercomputers” (a term coined recently) can perform more computations per second than their ordinary counterparts with the same hardware. As a result, while Meta claims to have constructed the “world’s fastest AI supercomputer,” it’s not always a direct comparison to the supercomputers that frequently make the news (rankings of which are compiled by the independent Top500.org and published twice a year).
To understand this further, you must first understand that both supercomputers and AI supercomputers use floating-point arithmetic, a mathematical shorthand that is extremely useful for calculations involving extremely large and small numbers (the “floating point” in question is the decimal point, which “floats” between significant figures).
The degree of precision used in floating-point computations may be modified based on different formats, and most supercomputers’ performance is calculated in 64-bit floating-point operations per second, or FLOPs. AI supercomputers, on the other hand, are frequently measured in 32-bit or even 16-bit FLOPs since AI calculations require less precision. As a result, comparing the two types of systems isn’t always an apples-to-apples comparison, however this proviso doesn’t take away from AI supercomputers’ remarkable strength and capability.
Sorensen adds one more word of caution to the mix. Top speeds aren’t necessarily representative, as is often the case with the “speeds and feeds” method to evaluating technology. “Typically, HPC suppliers provide performance statistics that imply the machine’s maximum speed. “That’s what we call theoretical peak performance,” Sorensen explains. “However, the true test of a successful system design is how quickly it can do the tasks it was created to do.” When running real-world applications, it’s not unusual for some HPCs to reach less than 25% of their so-called peak performance.”
To put it another way, the genuine value of supercomputers lies in the job they execute, not in their theoretical peak performance. That work for Meta entails developing moderation systems at a time when the company’s reputation is at an all-time low, as well as developing a new computing platform — whether based on augmented reality glasses or the metaverse — that it can dominate in the face of competitors like Google, Microsoft, and Apple. Although an AI supercomputer provides the corporation with raw computing capacity, Meta must still devise a winning strategy on its own.