Deep Infra
Deep Infra offers serverless AI APIs and dedicated GPU rentals with fast SSH access and low hourly pricing across flagship NVIDIA accelerators.
Key Features
- Serverless Model APIs
- OpenAI-compatible endpoints for 100+ models with autoscaling and pay-per-token billing
- Dedicated GPU Rentals
- B200 instances with SSH access spin up in about 10 seconds and bill hourly
- Custom LLM Deployments
- Deploy your own Hugging Face models onto dedicated A100, H100, H200, or B200 GPUs
- Transparent GPU Pricing
- Published per-GPU rates: A100 $0.89/hr, H100 $1.69/hr, H200 $1.99/hr, B200 $2.49/hr promo
- Inference-Optimized Hardware
- All hosted models run on H100 or A100 hardware tuned for low latency
Provider Comparison
Advantages
- Simple OpenAI-compatible API alongside controllable GPU rentals
- Competitive hourly rates for flagship NVIDIA GPUs including B200 promo pricing
- Fast provisioning with SSH access for dedicated instances
- Supports custom deployments in addition to hosted public models
Limitations
- Region list is not clearly published in the public marketing pages
- Primarily focused on inference and GPU rentals rather than broader cloud services
- B200 promo pricing is time-limited per site note
Available GPUs
GPU Modelโ | Memory | Hourly Price |
|---|---|---|
A100 SXM | 80GB | $1.50/hr |
H100 | 80GB | $2.40/hr |
H200 | 141GB | $3.00/hr |
Compute Services
Serverless Inference
Hosted model APIs with autoscaling on H100/A100 hardware.
Features
- OpenAI-compatible REST API surface
- Runs 100+ public models with pay-per-token pricing
- Autoscaling for low latency without manual instance management
Dedicated GPU Instances
On-demand GPU nodes with SSH access for custom workloads.
Pricing Options
| Option | Details |
|---|---|
| Serverless pay-per-token | OpenAI-compatible inference APIs with pay-per-request billing on H100/A100 hardware |
| Dedicated GPU hourly rates | Published pricing: A100 $0.89/hr, H100 $1.69/hr, H200 $1.99/hr, B200 $2.49/hr promo (then $4.49/hr) |
| B200 GPU rentals | SSH-accessible B200 nodes with flexible hourly billing and promo pricing noted on the site |
Getting Started
1
Create an account
Sign up (GitHub-supported) and open the Deep Infra dashboard
2
Enable billing
Add a payment method to unlock GPU rentals and API usage
3
Pick a GPU option
Choose serverless APIs or dedicated A100, H100, H200, or B200 instances
4
Launch and connect
Start instances with SSH access or call the OpenAI-compatible API endpoints
5
Monitor usage
Track spend and instance status from the dashboard and shut down when idle
Compare Providers
Find the best prices for the same GPUs from other providers