ultraData Center

Gaudi 3 GPU

Intel Gaudi 3 is a purpose-built AI accelerator offering competitive performance for training and inference of large language models.

VRAM 128GB
TDP 900W
Contact providers for pricing
Gaudi 3 GPU

Cloud Pricing

No pricing data available for this GPU at the moment.

Prices updated daily. Last check: 4/8/2026

Performance

FP16
1835 TFLOPS
Bandwidth
3675 GB/s

Strengths & Limitations

  • 128 GB of high-bandwidth memory enables training of large language models without memory constraints
  • Standard Ethernet-based fabric eliminates need for proprietary interconnects like InfiniBand
  • PCIe Gen5 form factor provides compatibility with existing server infrastructure
  • 1,835 TFLOPS FP16 performance delivers substantial computational capability
  • 3,675 GB/s memory bandwidth supports memory-intensive AI workloads
  • 900W TDP provides high performance density for data center deployment
  • Multiple form factors including PCIe card, mezzanine card, and UBB offer deployment flexibility
  • 900W power consumption requires robust cooling and power infrastructure
  • Limited to Intel's Gaudi ecosystem which has smaller software support compared to CUDA
  • Ethernet-based interconnect may have higher latency compared to dedicated AI fabrics
  • Newer architecture may have fewer optimized frameworks and libraries available
  • Single accelerator design lacks multi-GPU configurations on single cards

Key Features

Intel Gaudi architecture optimized for AI workloads
All-Ethernet-based fabric connectivity
Standard PCIe Gen5 interface
FP8 and BF16 precision support
High I/O connectivity per accelerator
Multi-modal model acceleration
Enterprise RAG optimization
Standard infrastructure integration

About Gaudi 3

The Intel Gaudi 3 is an AI accelerator built on Intel's Gaudi architecture, positioned as a cost-effective alternative to traditional GPU solutions for large-scale AI training and inference workloads. Released in April 2024, the Gaudi 3 represents Intel's approach to AI acceleration using standard Ethernet-based fabrics rather than proprietary interconnects. With 128 GB of high-bandwidth memory and 3,675 GB/s of memory bandwidth, the Gaudi 3 delivers 1,835 TFLOPS of FP16 performance while maintaining compatibility with existing data center infrastructure through its standard PCIe Gen5 form factor. The accelerator provides 2x the AI compute performance in FP8 and 4x the AI compute performance in BF16 compared to its predecessor, along with doubled network bandwidth for improved multi-node scaling.

Common Use Cases

The Gaudi 3 is designed for large language model training and inference, multi-modal AI applications, and enterprise retrieval-augmented generation (RAG) systems. Its 128 GB memory capacity makes it suitable for training large transformer models that require substantial memory for parameters and activations. The Ethernet-based fabric architecture makes it particularly well-suited for organizations that want to scale AI workloads using existing network infrastructure rather than investing in specialized interconnects. Enterprise deployments benefit from its standard form factors and infrastructure compatibility, while the high memory bandwidth supports both training workflows and high-throughput inference scenarios.

Full Specifications

Hardware

Manufacturer
Intel
Architecture
Gaudi
TDP
900W

Memory & Performance

VRAM
128GB
Memory Bandwidth
3675 GB/s
FP16
1835 TFLOPS
Release
2024

Frequently Asked Questions

How much does a Gaudi 3 cost per hour in the cloud?

Gaudi 3 pricing varies by provider, region, and commitment level. Check the pricing table above for current rates across all providers.

What is the Gaudi 3 best used for?

The Gaudi 3 is optimized for large language model training and inference, multi-modal AI applications, and enterprise RAG systems. Its 128 GB memory capacity and Ethernet-based scaling make it particularly suitable for organizations wanting to deploy large AI models using standard data center infrastructure.

How does Gaudi 3 compare to NVIDIA H100 for AI workloads?

The Gaudi 3 offers 128 GB of memory compared to the H100's 80 GB, providing advantages for memory-intensive models. However, the H100 delivers higher raw compute performance and has broader software ecosystem support. The Gaudi 3's Ethernet-based approach may appeal to organizations wanting to avoid proprietary interconnects, while the H100's NVLink provides lower-latency multi-GPU communication.