3 min read

Hedgehog AI Network delivers $50K minimum ROI per GPU

Marc Austin : Feb 27, 2024 6:03:15 PM

DGX H100 Blog AI Network

Hedgehog AI Network delivers $50K minimum ROI per GPU

In my “AI Needs a New Network” post last week, I noted that NVIDIA reported $13 billion in networking ARR on $18.4 billion of annual data center revenue. This week, we are diving a little deeper on the unit economics of GPUs, the Hedgehog AI Network, and LLMs offered by Hedgehog customers.

3.3M GPUs sold

Morgan Stanley estimates that NVIDIA sold 608,000 GPUs in the quarter ended Jan. 28, bringing the total number of units sold since 2021 to more than 3.3 million. We conservatively assume all 3.3 million GPUs are actively used in the field (Total Active GPUs).

$3,900 AI Network FMV

If you simply divide Total Active GPUs by $13 billion networking ARR, you get a quotient of $3,900 in networking ARR-per-GPU. That networking ARR is mostly for Infiniband subscriptions,which connect GPUs peer-to-peer in back-end training networks.

Easy ROI Decision

Hedgehog AI Network maximizes value NVIDIA DGX H100 server

Hedgehog AI network harvests value from your expensive GPU investment

The $3,900 per-year figure sounds expensive — until you consider the cost and value of a GPU. Morgan Stanley estimates the average sales price of a H100 GPU to be $30,000, and the blended ASP across all GPUs sold last quarter was $21,700. When you consider how much you are spending on a GPU, paying another 18-to-20% to ensure maximum utilization of your GPU is a no-brainer, especially when the math (see below) shows an incredible return on investment (ROI).

Ethernet Will Win Quickly

Dell’Oro Group notes that “800 Gbps [Ethernet] is expected to comprise the majority of the ports in AI back-end networks by 2025.” This means $13 billion of NVIDIA Infiniband ARR will migrate to Ethernet. The market shift will happen as Hedgehog customers deploy our high-performance AI network.

Hedgehog doesn’t do this alone. We build AI network software that works together with hardware from partners like Ram Velaga at Broadcom and the Spectrum X team at NVIDIA. We can deliver better performance than traditional Ethernet for AI workloads at much lower TCO than Infiniband. In fact, we predict that we can deliver this at better performance than Infiniband, too.

Industry observers like Dell’Oro acknowledge that “One could argue that Ethernet [hardware] is one speed generation ahead of InfiniBand. Network speed, however, is not the only factor. Congestion control and adaptive routing mechanisms are also important.” These congestion control and adaptive routing mechanisms require software from Hedgehog to deliver a complete AI Network solution.

Hedgehog AI Network Delivers 95% Effective Bandwidth

NVIDIA knows this is inevitable. That’s why the tech giant announced its plans to launch Spectrum X this quarter. The goal is to improve Ethernet effective bandwidth by 35% broadly. NVIDIA says AI workloads create congestion that limits traditional Ethernet networks to 60% effective bandwidth. Conversely, Spectrum X has a design goal of raising performance to 95% effective bandwidth. Hedgehog shares this performance goal with congestion control and adaptive routing software that uses Spectrum X hardware to deliver 95% effective bandwidth for Hedgehog

AI Ethernet. This means that if you invest in NVIDIA, Broadcom, or AMD hardware with 800Gbps Ethernet ports, you effectively get 760Gbps with a Hedgehog AI Network. That’s compared to 480Gbps with a traditional Ethernet network when you run AI workloads.

$50,000 minimum ROI for Hedgehog AI Network

So what is 95% effective bandwidth worth? To answer that question, simply look at market prices on GPU time, then correlate effective AI network bandwidth with Job Completion Time. (Here’s one source of data on market prices for LLM inference models.) As I am writing this post, DeepInfra is the price leader at $0.27 per minute for mixtral-8x7b, while OpenAI charges $30/min for GPT4. Fully utilized, a single DeepInfra GPU has a theoretical annual market value of $142,000. This is not possible since Job Completion Time is constrained by effective bandwidth of the AI network. With 60% effective bandwidth from traditional Ethernet, a DeepInfra GPU generates only $85,000 annually. With a Hedgehog AI network, it will generate $135K for a $50K ROI.

These numbers, of course, get a lot bigger for a customer like Together.AI, which prices llama2-70b-chat at $0.90 per minute (3x or $150,000). If a Hedgehog customer pays the Infiniband price of $3900, the ROI is 13X for DeepInfra or 38X for Together.AI. I mentioned before that we can deliver comparable performance at a better price, so the percentage ROI is actually a lot higher for Hedgehog customers.

AI Needs a New Network

Marc Austin : Feb 22, 2024 3:25:27 PM

Most investors associate NVIDIA with GPUs and recognize GPUs as the key ingredient for AI cloud infrastructure. But that’s only part of the AI...

Events Blog

Fully Tested Supermicro Rack Scale Solutions with Hedgehog AI Network

Marc Austin : Jun 3, 2024 3:06:35 PM

AI cloud builders now have a complete solution for open AI infrastructure. Hedgehog now supports Supermicro switches, and Supermicro includes...

AI Network Supermicro GPU network AI Cloud SSE-C4632SB GPU cloud Rack Scale SuperCluster 100G front-end AI network 400G back-end AI network SSE-T8032S Broadcom Tomahawk 4 Broadcom Trident3-X7

Juniper AI-Native NOW… or LATER?

Marc Austin : Mar 12, 2024 10:27:34 AM

In case you missed it, Juniper produced a web event last week called AI-Native NOW. I thought it might introduce Juniper as a Hedgehog competitor for...

Events Blog