Deep Infra

Optimized inference for open-source models

Rapidly-catching neocloud๐Ÿ‡บ๐Ÿ‡ธ USinferenceopen-sourcebudget

Deep Infra offers serverless AI APIs and dedicated GPU rentals with fast SSH access and low hourly pricing across flagship NVIDIA accelerators.

4
GPU Models
$0.89
From / hour

Available GPUs

Hourly on-demand pricing. Click column headers to sort.

Prices last updated: March 21, 2026

GPU Modelโ†‘
Memoryโ†‘
Price / hrโ†‘
A100 SXM80GB$0.89/hr
B200192GB$2.49/hr
H10080GB$1.69/hr
H200141GB$1.99/hr

Pros & Cons

Advantages

  • Simple OpenAI-compatible API alongside controllable GPU rentals
  • Competitive hourly rates for flagship NVIDIA GPUs including latest B200
  • Fast provisioning with SSH access for dedicated instances (ready in ~10 seconds)
  • Supports custom deployments in addition to hosted public models

Limitations

  • Region list is not clearly published in the public marketing pages
  • Primarily focused on inference and GPU rentals rather than broader cloud services
  • Newer player compared to established cloud providers

Key Features

Serverless Model APIs

OpenAI-compatible endpoints for 100+ models with autoscaling and pay-per-token billing

Dedicated GPU Rentals

B200 instances with SSH access spin up in about 10 seconds and bill hourly

Custom LLM Deployments

Deploy your own Hugging Face models onto dedicated A100, H100, H200, or B200 GPUs

Transparent GPU Pricing

Published per-GPU hourly rates for A100, H100, H200, and B200 with competitive pricing

Inference-Optimized Hardware

All hosted models run on H100 or A100 hardware tuned for low latency

Compute Services

Serverless Inference

Hosted model APIs with autoscaling on H100/A100 hardware.

  • OpenAI-compatible REST API surface
  • Runs 100+ public models with pay-per-token pricing
  • Autoscaling for low latency without manual instance management

Dedicated GPU Instances

On-demand GPU nodes with SSH access for custom workloads.

Pricing Options

OptionDetails
Serverless pay-per-tokenOpenAI-compatible inference APIs with pay-per-request billing on H100/A100 hardware
Dedicated GPU hourly ratesPublished transparent hourly pricing for A100, H100, H200, and B200 GPUs with pay-as-you-go billing
No long-term commitmentsFlexible hourly billing for dedicated instances with no prepayments or contracts required

Availability & Support

Regions

Region list not published on the GPU Instances page; promo mentions Nebraska availability alongside multi-region autoscaling messaging.

Support

Documentation site, dashboard guidance, Discord community link, and contact-sales options.

Getting Started

  1. 1

    Create an account

    Sign up (GitHub-supported) and open the Deep Infra dashboard

  2. 2

    Enable billing

    Add a payment method to unlock GPU rentals and API usage

  3. 3

    Pick a GPU option

    Choose serverless APIs or dedicated A100, H100, H200, or B200 instances

  4. 4

    Launch and connect

    Start instances with SSH access or call the OpenAI-compatible API endpoints

  5. 5

    Monitor usage

    Track spend and instance status from the dashboard and shut down when idle

Compare Providers

Find the best prices for the same GPUs from other providers

CoreWeave logo

CoreWeave

4 shared GPUs with Deep Infra

RunPod logo

RunPod

4 shared GPUs with Deep Infra

Amazon AWS logo

Amazon AWS

4 shared GPUs with Deep Infra