Deep Infra
Deep Infra offers serverless AI APIs and dedicated GPU rentals with fast SSH access and low hourly pricing across flagship NVIDIA accelerators.
Key Features
- Serverless Model APIs
- OpenAI-compatible endpoints for 100+ models with autoscaling and pay-per-token billing
- Dedicated GPU Rentals
- B200 instances with SSH access spin up in about 10 seconds and bill hourly
- Custom LLM Deployments
- Deploy your own Hugging Face models onto dedicated A100, H100, H200, or B200 GPUs
- Transparent GPU Pricing
- Published per-GPU hourly rates for A100, H100, H200, and B200 with competitive pricing
- Inference-Optimized Hardware
- All hosted models run on H100 or A100 hardware tuned for low latency
Provider Comparison
Advantages
- Simple OpenAI-compatible API alongside controllable GPU rentals
- Competitive hourly rates for flagship NVIDIA GPUs including latest B200
- Fast provisioning with SSH access for dedicated instances (ready in ~10 seconds)
- Supports custom deployments in addition to hosted public models
Limitations
- Region list is not clearly published in the public marketing pages
- Primarily focused on inference and GPU rentals rather than broader cloud services
- Newer player compared to established cloud providers
Available GPUs
GPU Modelโ | Memory | Hourly Price |
|---|---|---|
A100 SXM | 80GB | $0.89/hr |
B200 | 192GB | $2.49/hr |
H100 | 80GB | $1.69/hr |
H200 | 141GB | $1.99/hr |
Compute Services
Serverless Inference
Hosted model APIs with autoscaling on H100/A100 hardware.
Features
- OpenAI-compatible REST API surface
- Runs 100+ public models with pay-per-token pricing
- Autoscaling for low latency without manual instance management
Dedicated GPU Instances
On-demand GPU nodes with SSH access for custom workloads.
Pricing Options
| Option | Details |
|---|---|
| Serverless pay-per-token | OpenAI-compatible inference APIs with pay-per-request billing on H100/A100 hardware |
| Dedicated GPU hourly rates | Published transparent hourly pricing for A100, H100, H200, and B200 GPUs with pay-as-you-go billing |
| No long-term commitments | Flexible hourly billing for dedicated instances with no prepayments or contracts required |
Getting Started
1
Create an account
Sign up (GitHub-supported) and open the Deep Infra dashboard
2
Enable billing
Add a payment method to unlock GPU rentals and API usage
3
Pick a GPU option
Choose serverless APIs or dedicated A100, H100, H200, or B200 instances
4
Launch and connect
Start instances with SSH access or call the OpenAI-compatible API endpoints
5
Monitor usage
Track spend and instance status from the dashboard and shut down when idle
Compare Providers
Find the best prices for the same GPUs from other providers