mastodon.online is one of the many independent Mastodon servers you can use to participate in the fediverse.
A newer server operated by the Mastodon gGmbH non-profit

Server stats:

10K
active users

#bluefield

0 posts0 participants0 posts today

Inside the World's Largest AI Supercluster xAI Colossus

---

Summary

🚀 #Largest AI Supercomputer: The #xAI #Colossus is built with over 100,000 GPUs, massive storage, and #High Speed Networking, designed for #AI projects beyond typical #Chatbot applications.

🛠️ #Record Breaking Construction: The facility, containing over 100,000 GPUs, was constructed in just 122 days—significantly faster than traditional #Supercomputers that often take years.

💧 #Advanced Liquid Cooling System: The #Data Halls are equipped with #State Of The Art liquid cooling, using separate pipes for hot water and cold water, which efficiently manages #Heat from the #GPU Servers.

📊 #Scalable GPU Racks: Each rack includes multiple #NVIDIA #HGX H100 units, optimized with #Compact and easily serviceable designs, featuring #Cooling Manifolds and advanced #GPU configurations.

🔌 #Innovative Power Management: #Tesla #Mega Packs support the power demands of the #AI Clusters by managing microsecond power fluctuations, stabilizing #Energy Delivery to the GPU units.

🌐 #Ethernet Driven Networking: Unlike most #Supercomputers, the cluster uses #Ethernet Networking with #NVIDIA #Bluefield 3 #DPUs and #Spectrum X switches, offering robust 400 Gbps connections for efficient data flow.

youtu.be/Jf8EPSBZU7Y?si=Gi1i66

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Widescreen Wonder: #LasVegasSphere
54,000 m2 (~3.67 acre) interior LED display (16x16K) and an exterior LED display (‘Exosphere’) consisting out of 1.23 million LED ‘pucks’. Driving all these pixels are around 150 #NVidia RTX #A6000 #GPU, installed in computer systems which are networked using NVidia #BlueField data processing units (#DPU) and NVidia #ConnectX6 NICs (up to 400 Gb/s), with visual content transferred from Sphere Studios in Cali. All this hardware uses 45kW.
blogs.nvidia.com/blog/sphere-l

Replied in thread

@karppinen Mellanox/NVIDIA has been trying to shove #BlueField into any customer box they can for years. They're even mandatory in some configurations (e.g. DGX) and there's no shortage of stock.

The "Self-Hosted DPU Controller" mode mentioned in the video has been officially supported with BSP 4.5.0 since December 2023, but customers like #Netflix and us got access to that long before.

Probably Netflix is actually running this right now at 100 Watts, but we have no confirmation.

Replied in thread

@karppinen According to the video stream for that talk, this refered to a prototype that wasn't ready or in use at the moment he talked about it and consumed at least 125 watts when last measured:

m.youtube.com/watch?v=q4TZxj-D

Around 23:00

So while the idea is nothing new and it's quite possible (people have been running Yocto Linux with nginx and Offload directly on #nvidia #Bluefield for a while), those slides do not prove #netflix "can now" do it.