
Deep Infra
Optimized inference for open-source models
Deep Infra offers serverless AI APIs and dedicated GPU rentals with fast SSH access and low hourly pricing across flagship NVIDIA accelerators.
Available GPUs
Hourly on-demand pricing. Click column headers to sort.
Prices last updated: March 21, 2026
Pros & Cons
Advantages
- Simple OpenAI-compatible API alongside controllable GPU rentals
- Competitive hourly rates for flagship NVIDIA GPUs including latest B200
- Fast provisioning with SSH access for dedicated instances (ready in ~10 seconds)
- Supports custom deployments in addition to hosted public models
Limitations
- Region list is not clearly published in the public marketing pages
- Primarily focused on inference and GPU rentals rather than broader cloud services
- Newer player compared to established cloud providers
Key Features
Serverless Model APIs
OpenAI-compatible endpoints for 100+ models with autoscaling and pay-per-token billing
Dedicated GPU Rentals
B200 instances with SSH access spin up in about 10 seconds and bill hourly
Custom LLM Deployments
Deploy your own Hugging Face models onto dedicated A100, H100, H200, or B200 GPUs
Transparent GPU Pricing
Published per-GPU hourly rates for A100, H100, H200, and B200 with competitive pricing
Inference-Optimized Hardware
All hosted models run on H100 or A100 hardware tuned for low latency
Compute Services
Serverless Inference
Hosted model APIs with autoscaling on H100/A100 hardware.
- OpenAI-compatible REST API surface
- Runs 100+ public models with pay-per-token pricing
- Autoscaling for low latency without manual instance management
Dedicated GPU Instances
On-demand GPU nodes with SSH access for custom workloads.
Pricing Options
| Option | Details |
|---|---|
| Serverless pay-per-token | OpenAI-compatible inference APIs with pay-per-request billing on H100/A100 hardware |
| Dedicated GPU hourly rates | Published transparent hourly pricing for A100, H100, H200, and B200 GPUs with pay-as-you-go billing |
| No long-term commitments | Flexible hourly billing for dedicated instances with no prepayments or contracts required |
Availability & Support
Regions
Region list not published on the GPU Instances page; promo mentions Nebraska availability alongside multi-region autoscaling messaging.
Support
Documentation site, dashboard guidance, Discord community link, and contact-sales options.
Getting Started
- 1
Create an account
Sign up (GitHub-supported) and open the Deep Infra dashboard
- 2
Enable billing
Add a payment method to unlock GPU rentals and API usage
- 3
Pick a GPU option
Choose serverless APIs or dedicated A100, H100, H200, or B200 instances
- 4
Launch and connect
Start instances with SSH access or call the OpenAI-compatible API endpoints
- 5
Monitor usage
Track spend and instance status from the dashboard and shut down when idle
Compare Providers
Find the best prices for the same GPUs from other providers