Granite 4.0 H Micro
Granite 4.0 H Micro is IBM's lightweight text model designed for high-speed inference with a 131K token context window.
API Pricing
| Provider | Input / 1M | Output / 1M | Speed | TTFT | Updated |
|---|---|---|---|---|---|
| $0.017 | $0.110 | 394 t/s | 8.7s | 4/14/2026 |
Prices updated daily. Last check: 4/14/2026
Model Details
General
- Creator
- IBM
- Family
- Granite
- Tier
- Lightweight
- Context Window
- 131K
- Modalities
- Text
Capabilities
- Tool Calling
- No
- Open Source
- No
Strengths & Limitations
- Fast output generation at 464 tokens per second
- 131K token context window supports long document processing
- Lightweight architecture enables efficient deployment
- IBM enterprise backing with corporate support
- Optimized for high-throughput inference workloads
- Competitive context size for micro-tier model class
- No tool calling or function execution capabilities
- Text-only processing - no image or multimodal support
- Higher time to first token at 8.7 seconds compared to some peers
- Proprietary model - weights not publicly available
- Limited reasoning capabilities compared to larger models in family
Key Features
About Granite 4.0 H Micro
Common Use Cases
Granite 4.0 H Micro is well-suited for high-volume text processing applications where speed and cost efficiency are priorities. Its fast token generation makes it effective for content summarization, document analysis, and automated text generation workflows that process large quantities of material. The 131K context window enables processing of substantial documents, research papers, or multi-turn conversations without truncation. Organizations needing reliable text processing for customer service automation, content moderation, or data extraction tasks can benefit from its combination of decent capability and optimized performance. The model works well for applications where advanced reasoning or tool use are not required, but consistent, fast text processing is essential.
Frequently Asked Questions
How much does Granite 4.0 H Micro cost per million tokens?
Granite 4.0 H Micro pricing varies by provider and usage type. Check the pricing table above for current rates across all available providers offering this model.
What is Granite 4.0 H Micro best used for?
Granite 4.0 H Micro excels at high-volume text processing tasks requiring fast inference speeds. It's ideal for document summarization, content generation, text analysis, and automated writing workflows where speed and cost efficiency matter more than advanced reasoning capabilities.
How does Granite 4.0 H Micro compare to other lightweight models?
Granite 4.0 H Micro offers a notably large 131K context window for its micro tier, combined with fast 464 tokens/second output generation. However, it has a longer time to first token at 8.7 seconds and lacks tool calling capabilities that some competing lightweight models provide.