What GPU types does Deep Infra offer?

Deep Infra offers various GPU types including A100 SXM, H100 SXM, H200, HGX B300, B200. Check the pricing table above for current availability and pricing.

How do I get started with Deep Infra?

Create an account, Enable billing, Pick a GPU option, Launch and connect, Monitor usage

What are Deep Infra's main advantages?

Deep Infra's main advantages include: Simple OpenAI-compatible API alongside controllable GPU rentals, Competitive hourly rates for flagship NVIDIA GPUs including latest B200 and B300, Fast provisioning with SSH access for dedicated instances (ready in ~10 seconds), Supports custom deployments in addition to hosted public models.

What are Deep Infra's limitations?

Deep Infra's main limitations include: Region list is not clearly published in the public marketing pages, Primarily focused on inference and GPU rentals rather than broader cloud services, Newer player compared to established cloud providers.

Deep Infra

Optimized inference for open-source models

Rapidly-catching neocloud🇺🇸 USinferenceopen-sourcebudget

Last reviewed Mar 14, 2026

Deep Infra offers serverless AI APIs and dedicated GPU rentals with fast SSH access and low hourly pricing across flagship NVIDIA accelerators.

Visit Deep Infra Documentation

GPU Models

$0.89

From / hour

LLM Models

$0.02

From / 1M input

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: May 13, 2026

GPU Model↑	Memory↑	GPUs	Price / hr↑	Updated↑
A100 SXM	80GB	1×	$0.89/hr	5/13/2026
B200	192GB	1×	$2.79/hr	5/13/2026
H100 SXM	80GB	1×	$1.79/hr	5/13/2026
H200	141GB	1×	$2.19/hr	5/13/2026
HGX B300	288GB	1×	$4.20/hr	5/13/2026

LLM API Pricing

Pay-per-token pricing. Prices shown per 1M tokens.

Prices last updated: May 13, 2026

Model	Creator	Context	Input/1M	Output/1M	Updated
Mistral Nemo	Mistral	131K	$0.020	$0.040	5/13/2026
Llama 3.1 8B	Meta	128K	$0.020	$0.030	5/13/2026
Qwen 3.5 9B	Alibaba	256K	$0.030	$0.150	5/13/2026
Llama 3 8B	Meta	8K	$0.030	$0.040	4/30/2026
GPT-OSS-20B	OpenAI	128K	$0.030	$0.140	5/13/2026
Nemotron Nano 9B v2	NVIDIA	131K	$0.040	$0.160	5/13/2026
Gemma 3 4B	Google	131K	$0.040	$0.080	5/13/2026
Gemma 3 12B	Google	131K	$0.040	$0.130	5/13/2026
Mistral Small	Mistral	32K	$0.050	$0.080	5/13/2026
Nemotron 3 Nano 30B	NVIDIA	262K	$0.050	$0.200	5/13/2026

Pros & Cons

Advantages

Simple OpenAI-compatible API alongside controllable GPU rentals
Competitive hourly rates for flagship NVIDIA GPUs including latest B200 and B300
Fast provisioning with SSH access for dedicated instances (ready in ~10 seconds)
Supports custom deployments in addition to hosted public models

Limitations

Region list is not clearly published in the public marketing pages
Primarily focused on inference and GPU rentals rather than broader cloud services
Newer player compared to established cloud providers

Key Features

Serverless Model APIs

OpenAI-compatible endpoints for 100+ models with autoscaling and pay-per-token billing

Dedicated GPU Rentals

B200 instances with SSH access spin up in about 10 seconds and bill hourly

Custom LLM Deployments

Deploy your own Hugging Face models onto dedicated A100, H100, H200, B200, or B300 GPUs

Transparent GPU Pricing

Published per-GPU hourly rates for A100, H100, H200, B200, and B300 with competitive pricing

Inference-Optimized Hardware

All hosted models run on H100 or A100 hardware tuned for low latency

Multimodal AI Support

Comprehensive APIs for text, vision, image generation, video generation, speech recognition, and text-to-speech

SOC 2 & ISO 27001 Compliance

Zero retention policy with enterprise-grade security certifications for data privacy and protection

Compute Services

Serverless Inference

Hosted model APIs with autoscaling on H100/A100 hardware.

OpenAI-compatible REST API surface
Runs 100+ public models including DeepSeek, Qwen, Llama, Claude, and Gemini families
Multimodal support for text, vision, image generation, video generation, and speech
Autoscaling for low latency without manual instance management

Dedicated GPU Instances

On-demand GPU nodes with SSH access for custom workloads.

Pricing Options

Option	Details
Serverless pay-per-token	OpenAI-compatible inference APIs with pay-per-request billing on H100/A100 hardware
Dedicated GPU hourly rates	Published transparent hourly pricing for A100, H100, H200, B200, and B300 GPUs with pay-as-you-go billing
No long-term commitments	Flexible hourly billing for dedicated instances with no prepayments or contracts required

Availability & Support

Regions

Region list not published on the GPU Instances page; promo mentions Nebraska availability alongside multi-region autoscaling messaging.

Support

Documentation site, dashboard guidance, Discord community link, and contact-sales options.

Getting Started

1
Create an account
Sign up (GitHub-supported) and open the Deep Infra dashboard
2
Enable billing
Add a payment method to unlock GPU rentals and API usage
3
Pick a GPU option
Choose serverless APIs or dedicated A100, H100, H200, B200, or B300 instances
4
Launch and connect
Start instances with SSH access or call the OpenAI-compatible API endpoints
5
Monitor usage
Track spend and instance status from the dashboard and shut down when idle

Compare Providers

Find the best prices for the same GPUs from other providers

CoreWeave

5 shared GPUs with Deep Infra

Compare Prices

RunPod

5 shared GPUs with Deep Infra

Compare Prices

IO.NET

5 shared GPUs with Deep Infra

Compare Prices