ultraData Center

Gaudi 3 GPU

Name: Gaudi 3
Brand: Intel

Intel Gaudi 3 is a purpose-built AI accelerator offering competitive performance for training and inference of large language models.

VRAM 128GB

TDP 900W

Contact providers for pricing

Compare Prices Specs →

Cloud Pricing

No pricing data available for this GPU at the moment.

Prices updated daily. Last check: May 23, 2026

Performance

FP16

1835 TFLOPS

Bandwidth

3675 GB/s

Strengths & Limitations

Strengths

128 GB of high-bandwidth memory enables training of large language models without memory constraints
Standard Ethernet-based fabric eliminates need for proprietary interconnects like InfiniBand
PCIe Gen5 form factor provides compatibility with existing server infrastructure
1,835 TFLOPS FP16 performance delivers substantial computational capability
3,675 GB/s memory bandwidth supports memory-intensive AI workloads
900W TDP provides high performance density for data center deployment
Multiple form factors including PCIe card, mezzanine card, and UBB offer deployment flexibility

Limitations

900W power consumption requires robust cooling and power infrastructure
Limited to Intel's Gaudi ecosystem which has smaller software support compared to CUDA
Ethernet-based interconnect may have higher latency compared to dedicated AI fabrics
Newer architecture may have fewer optimized frameworks and libraries available
Single accelerator design lacks multi-GPU configurations on single cards

Key Features

•Intel Gaudi architecture optimized for AI workloads

•All-Ethernet-based fabric connectivity

•Standard PCIe Gen5 interface

•FP8 and BF16 precision support

•High I/O connectivity per accelerator

•Multi-modal model acceleration

•Enterprise RAG optimization

•Standard infrastructure integration

About Gaudi 3

The Intel Gaudi 3 is an AI accelerator built on Intel's Gaudi architecture, positioned as a cost-effective alternative to traditional GPU solutions for large-scale AI training and inference workloads. Released in April 2024, the Gaudi 3 represents Intel's approach to AI acceleration using standard Ethernet-based fabrics rather than proprietary interconnects. With 128 GB of high-bandwidth memory and 3,675 GB/s of memory bandwidth, the Gaudi 3 delivers 1,835 TFLOPS of FP16 performance while maintaining compatibility with existing data center infrastructure through its standard PCIe Gen5 form factor. The accelerator provides 2x the AI compute performance in FP8 and 4x the AI compute performance in BF16 compared to its predecessor, along with doubled network bandwidth for improved multi-node scaling.

Common Use Cases

The Gaudi 3 is designed for large language model training and inference, multi-modal AI applications, and enterprise retrieval-augmented generation (RAG) systems. Its 128 GB memory capacity makes it suitable for training large transformer models that require substantial memory for parameters and activations. The Ethernet-based fabric architecture makes it particularly well-suited for organizations that want to scale AI workloads using existing network infrastructure rather than investing in specialized interconnects. Enterprise deployments benefit from its standard form factors and infrastructure compatibility, while the high memory bandwidth supports both training workflows and high-throughput inference scenarios.

Full Specifications

Hardware

Manufacturer: Intel
Architecture: Gaudi
TDP: 900W

Memory & Performance

VRAM: 128GB
Memory Bandwidth: 3675 GB/s
FP16: 1835 TFLOPS
Release: 2024

Frequently Asked Questions

How much does a Gaudi 3 cost per hour in the cloud?

Gaudi 3 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the Gaudi 3 best used for?

The Gaudi 3 is optimized for large language model training and inference, multi-modal AI applications, and enterprise RAG systems. Its 128 GB memory capacity and Ethernet-based scaling make it particularly suitable for organizations wanting to deploy large AI models using standard data center infrastructure.

How does Gaudi 3 compare to NVIDIA H100 for AI workloads?

The Gaudi 3 offers 128 GB of memory compared to the H100's 80 GB, providing advantages for memory-intensive models. However, the H100 delivers higher raw compute performance and has broader software ecosystem support. The Gaudi 3's Ethernet-based approach may appeal to organizations wanting to avoid proprietary interconnects, while the H100's NVLink provides lower-latency multi-GPU communication.