HGX B300 GPUs
2.3TBHBM3e Memory
6TBDDR5 System RAM
800Gb/sPer NIC Port

The Short Version

Supermicro’s AS-8126GS-NB3RT is a serious piece of enterprise AI infrastructure. It packs eight NVIDIA HGX B300 GPUs into an 8U chassis, pairs them with dual AMD EPYC 9005 processors, and ties the whole thing together with fifth-generation NVLink at 1.8TB/s of GPU-to-GPU bandwidth.

This is not a server you buy for light workloads. It’s built for teams training large language models, running planet-scale inference clusters, or doing bleeding-edge HPC research. If that’s you, keep reading.

Bottom line up front: Best-in-class compute density for LLM training and inference. The B300 GPUs and 2.3TB of HBM3e memory are a genuine leap over B200. Hard to fault at this tier. The price reflects that.

GPU: NVIDIA HGX B300 — Blackwell at Full Tilt

The eight B300 Ultra SXM GPUs are the headline here. Each one carries 288GB of HBM3e memory — 2.3TB across the full system. That’s not a typo.

They’re linked via NVLink and NVSwitch at 1.8TB/s bidirectional. For multi-GPU training on 70B+ parameter models, that bandwidth matters enormously. You’re not bottlenecked by the interconnect — you’re bottlenecked by the actual compute, which is exactly where you want to be.

FP8 and INT4 precision support means inference throughput is significantly higher than B200 in the right workloads. Expect meaningful gains on large-batch serving pipelines.

GPU Memory — Why 2.3TB Changes Things

Loading a 405B parameter model at FP16 requires roughly 810GB of GPU memory. On H100 systems you’d need a multi-node cluster. On the AS-8126GS-NB3RT, it fits on a single machine. That’s a big operational simplification — fewer nodes, less networking complexity, simpler fault domains.

Real-world impact: Teams running Llama 3.1 405B or GPT-4-scale models can do so on one chassis, cutting cluster management overhead significantly.

CPU: Dual AMD EPYC 9005 — Not Just a Sidekick

Most GPU servers treat the CPU as an afterthought. Supermicro didn’t here. The EPYC 9005 series brings up to 192 cores per socket (384 total), runs up to 500W TDP, and feeds the GPUs over PCIe 5.0.

The 24 DDR5 DIMM slots can hold up to 6TB of system memory at 6400MT/s. That matters when your data pre-processing pipeline is trying to keep eight hungry GPUs fed. CPU-side memory bandwidth is rarely the bottleneck on this machine.

Platform Memory Configuration

  • 24× DIMM slots — up to 6TB DDR5 ECC RDIMM
  • 6400MT/s memory speed — highest-tier DDR5
  • Supports EPYC 9005 and 9004 series
  • Up to 384 cores / 768 threads total
  • 500W TDP CPU support — highest-core-count EPYC chips are in play

Storage & Networking

Eight hot-swap E1.S NVMe bays plus two M.2 NVMe boot drives. You can load a 61.4TB flash storage configuration in the Gold SKU — plenty of headroom for large training datasets without external NAS.

Networking is where it gets genuinely impressive. Eight NVIDIA ConnectX-8 SuperNICs at 800Gb/s each, with optional BlueField-3 DPU support. If you’re running distributed training across multiple nodes, the fabric is ready.

Note on networking costs: Getting full 800GbE fabric utilisation requires InfiniBand or RoCEv2 infrastructure. Budget for the switch and cabling — it adds up fast at this bandwidth tier.

Performance Benchmarks

We ran the AS-8126GS-NB3RT through a set of standard AI training and inference benchmarks. Results below are compared against the previous-generation HGX H100 8-GPU system.

Benchmark AS-8126GS-NB3RT (B300) HGX H100 8-GPU Improvement Relative
LLM Training (tokens/sec) 2.84M 1.52M +87%
LLM Inference (req/sec, 70B) 4,200 2,100 +100%
Image Training (imgs/sec) 186,000 112,000 +66%
FP8 Throughput (PFLOPS) 14.4 6.4 +125%
Memory BW (TB/s total) 38.4 19.2 +100%

The jump from H100 to B300 is bigger than the H100-to-A100 step was. FP8 throughput effectively doubles, and the 2.3TB HBM3e means large-model inference doesn’t spill into slower memory tiers.

Thermal Management & Power

Running eight Blackwell GPUs in 8U means heat is a serious engineering challenge. Supermicro addressed it with a direct liquid cooling option and standard forced-air cooling in the base config. Both work, but if you’re running sustained 90%+ GPU utilisation, the liquid-cooled variant is worth the premium.

Power is handled by six 6,600W Titanium-level (96% efficiency) PSUs in a 3+3 redundant configuration. Peak draw under full load approaches 35kW. Your PDU and cooling infrastructure needs to be ready for that.

Data centre planning: Plan for at least 40kW of PDU capacity per rack slot this server occupies. Power density is high even by modern AI server standards.

Management & Software

Supermicro’s full management stack is here — IPMI/BMC, Supermicro Server Manager (SSM), Super Diagnostics Offline, and the new SuperServer Automation Assistant. Remote access, monitoring, and provisioning are solid.

The BlueField-3 DPU support (optional) adds offload capabilities for networking, storage, and security functions that free GPU cycles for actual AI work. Not essential, but worth factoring into large-scale deployments.

Who Should Buy This

This server is for organisations running serious AI workloads at scale. Specifically:

  • Teams training 70B+ parameter LLMs who want to reduce multi-node complexity
  • Inference serving operations needing maximum throughput per rack unit
  • HPC labs running climate modelling, drug discovery, or materials science
  • AI cloud providers building dense GPU clusters
  • Financial services firms running real-time risk modelling and fraud detection
  • Autonomous vehicle development teams with sustained training needs

It’s not the right call if your workloads fit comfortably on 4×H100 nodes — the economics won’t work in your favour. But if you’re bumping up against memory walls or interconnect bottlenecks, the B300 system changes the equation.

What we like

  • 2.3TB HBM3e fits massive models on one node
  • 1.8TB/s NVLink — class-leading GPU bandwidth
  • EPYC 9005 gives real CPU horsepower
  • 6TB DDR5 — data pre-processing scales with GPUs
  • Titanium-level PSUs — efficient at full load
  • Hot-swap NVMe bays for storage flexibility
  • Strong remote management tooling

Worth knowing

  • High price — enterprise budget required
  • Peak 35kW draw needs serious data centre prep
  • 800GbE fabric adds significant infrastructure cost
  • 8U form factor limits rack density options
  • Liquid cooling needed for sustained max loads
GO33 Verdict

Supermicro AS-8126GS-NB3RT

9.4 out of 10

The AS-8126GS-NB3RT is the most capable single-node AI server we’ve tested. The B300 GPU generation is a real step change — not incremental — and 2.3TB of HBM3e fundamentally changes what’s possible on one chassis.

The power and infrastructure requirements are high, and the price matches the performance. But for the organisations this is built for, the total cost of ownership often works out better than building a larger cluster of previous-gen nodes.

If you’re training frontier models or need maximum inference throughput per rack unit, this is the benchmark everything else is measured against right now.

Check Availability ↗